Bellman equation with allowance for different future states
'policy-based' values of grid squares (note discounting for randomness and for bad outcomes)
'impact of different levels of negative reward on policy
V and Q equations
Q equation after substitution (accounting for V being recursive)