consider the policy iteration algorithm for a finite state MDP. Suppose the initial policy is a stochastic policy. Now, can the optimal policy be deterministic after improvements ? Or, can we say that always the optimal policy will be a stochastic one ? Confused about this. Any ideas will be helpful. The reason I am asking this question is that in the absence of model i.e. when we need to need to use Monte Carlo methods then each of the improved policies must be a stochastic one to make sure action-value function estimates are near equal to the mean.

asked 15 Jun '14, 09:04

sosha's gravatar image

accept rate: 0%

edited 15 Jun '14, 09:17

I am afraid that I might misunderstand your question again.

Optimal policy doesn't have to be stochastic. Deterministic policy can be considered as a special case of stochastic policy. But, mainly it depends on the problem.

Sometimes, even we use stochastic policy, the result (optimal policy) becomes deterministic policy (policy with probability 1).

And, 'Monte Carlo methods' can be used for approximate dynamic programming. MDP is a type of Dynamic programming. When the possible transition (or observation) is too many, we estimate value function using Monte Carlo methods. You may want to look at the approximate dynamic programming. (refer the book by Prof. Powell)


answered 18 Jun '14, 10:32

ksphil's gravatar image

accept rate: 14%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 15 Jun '14, 09:04

Seen: 624 times

Last updated: 18 Jun '14, 10:32

OR-Exchange! Your site for questions, answers, and announcements about operations research.