AI AZ

Udemy - Artificial Intelligence: Hadelin de Ponteves and Kirill Eremenko.

Notes

1. reinforcement learning (dog training; +1 -1)
2. Bellmans equation  (maximize reward of action based on position; closer to goal gets value closer to 1
3. Markov processes (future states depend on present state, not on past events)    
4. Markov decision process (modeling for outcomes, which are partly random and partly under control of decision  maker)
5. policy vs plan (policy-takes randomness into account; plan in this context assumes 
  deterministic actions based on best potential action)
6. Living penalty( can change policy)
7. Q-learning (quantifier of 'action' vs. value of state)
8. Temporal difference (difference introduced by iteration between before and after); TD eventually becomes zero
9. 'Q' stable when TD zero 
  -----------------neural networks
1. see figures below
2. training a network by adjusting weights
3. Once trained output is more precise
4. (Batch)Gradient descent (changing weights based on derivative of cost-function slope, but works only for convex function)
5. Stochastic gradient descent (adjust w's after each row, rather than after all rows)
6. Batch is 'deterministic' (same final weights given set of training data), but less efficient and prone to local minimums
7. 'mini-batch' (run multiple rows, but not all at a time) can avoid 'correlation of sequential experience issue'
8. Backpropagation (adjusts all weights are the same time)

-------------------Deep Q learning intuition
  
1. place grid from Q-learning on x/y axis and feed to network
2. Network generates for Q values (one for each direction)
3. weights of nodes are updated generating new Q values
4. 'L' (Loss: difference between Q and target Q) is reduced (similar to what was done with TD above)
5. Pass through 'softmax' which selects best Q, which is action
6. Concepts:  'experience replay' and 'sequential order of experiences' (habit), 'experience in batches' (rolling window)
7. Action selection policies to avoid getting stuck in local max (episolon greedy, episolon soft, softmax)
--greedy (most of the exploit, but sometimes explore)
--soft  (more explore)
--more sophisticated version of greedy
8. exploration vs. exploitation

Images

https://i.imgur.com/64PDufU.png, Q equation simplified
  https://i.imgur.com/ArOSuba.png, Q equation with temporal difference
 https://i.imgur.com/unrJWDn.png, temperal difference equation
  , V----------------Neural networks-----------------------V
  https://i.imgur.com/JVZ72rZ.png, threshold function
  https://i.imgur.com/pt7BzRD.png, sigmoid activation function
  https://i.imgur.com/w1pGqMB.png, rectifier function
  https://i.imgur.com/H4Pkth4.png, hyperbolic function
  https://i.imgur.com/T1vuJnx.png, neuron model
  https://i.imgur.com/Z1WFABw.png, one common application of functions above
  https://i.imgur.com/1m9a0rj.png, set up of network neuron (one row; compare output to actual; generate cost function)
  https://i.imgur.com/cjN6SXf.png, 8 rows-when weights are adjusted they are same for each row; this is also referred to as backpropagation
  https://i.imgur.com/b4MB71d.png, gradient descent
  https://i.imgur.com/aCwN7XJ.png, 3D gradient descent
  https://i.imgur.com/KExEkH4.png, 3D gradient descent
  https://i.imgur.com/iHEoOd9.png, gradient descent with non-convex function
  https://i.imgur.com/U45krjp.png, stochastic vs batch gradient descent
  https://i.imgur.com/z813BNt.png, backpropagation
  https://i.imgur.com/2mahRWe.png, summary of training using stochastic gradient descent and backpropagation
 , V________________________________Deep Q Learning___________________________V
  https://i.imgur.com/NLXcv1B.png, superimposing Q-learning grid onto x/y axis and using that to feed to neural net
 https://i.imgur.com/ObcD9E5.png, Qs being adjusted, and L going down
   https://i.imgur.com/2XeOGLF.png, adding 'acting' component involving passing through softmax 
  https://i.imgur.com/Y2YOwmB.png, action selection policies
  https://i.imgur.com/FujGtaA.png, softmax

Udemy - Artificial Intelligence: Hadelin de Ponteves and Kirill Eremenko.

Notes

Links

Images