Udemy - Artificial Intelligence: Hadelin de Ponteves and Kirill Eremenko.

Notes

Links

Images

grid for bellman eq
basic grid with agent, and reward structure
grid for bellman eq
'plan-based' values based from Bellman Eq

grid for bellman eq

Bellman equation with allowance for different future states

grid for bellman eq

'policy-based' values of grid squares (note discounting for randomness and for bad outcomes)

grid for bellman eq

'impact of different levels of negative reward on policy

grid for bellman eq

V and Q equations

grid for bellman eq

Q equation after substitution (accounting for V being recursive)