Suppose for a MDP all the transitions are deterministic. Then can there by any other easy (other than value iteration/policy iteration) algorithm to calculate optimal policy ?

asked 20 Jun '14, 11:35

sosha's gravatar image

sosha
100136
accept rate: 0%


The only advantage of deterministic transitions is that for finite horizon problems you can either run the value iteration problem forward in time or backward in time. Bellman's book on dynamic programming has several examples of Markov decision processes with deterministic transition functions.

link

answered 24 Feb '16, 00:49

adityam's gravatar image

adityam
111
accept rate: 0%

Nothing different. Deterministic case is a special case of stochastic case.

Usually, we don't call a deterministic problem MDP even though it is not wrong to do so technically. We call it just dynamic programming. (DP includes MDP)

For finite time horizon problem, by backward induction

For infinite time horizon problem, by value/policy iteration.

When transition is deterministic, calculation is much easier, since you don't need to calculate Expectation of next period value functions. You just add the next period value function, instead.

link

answered 26 Feb '16, 23:25

ksphil's gravatar image

ksphil
66717
accept rate: 14%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×58

Asked: 20 Jun '14, 11:35

Seen: 1,386 times

Last updated: 26 Feb '16, 23:25

OR-Exchange! Your site for questions, answers, and announcements about operations research.