Consider Bellman optimally equation for value function. Now, if we use the iterative algorithm to compute the optimal value function will it converge to the same ? I see that books talks about policy/value iteration algorithms which are alternate sequence of policy evaluation and improvement steps. This is shown to converge. Why the books does not do the obvious thing. There must be some reason.
asked
sosha |