Suppose for a POMDP, we have calculated a deterministic optimal policy (i.e. mapping from belief states to action). Now, I want to apply it. So, I have an observation and I want to choose the optimal action. What should I do ? How shall I find belief state vector from the observation.

asked 16 Jun '14, 02:10

sosha's gravatar image

accept rate: 0%

Your belief state vector needs to be updated every time we get an observation. As you mentioned before it is simple process.

Initial belief is equally distributed according to the first observation. and using action and observation, we can improve our belief (probability vector of our current state).

If you read the reference I told you before ( You can see the simple example of belief updating. (Room hopping problem)


answered 18 Jun '14, 10:42

ksphil's gravatar image

accept rate: 14%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 16 Jun '14, 02:10

Seen: 837 times

Last updated: 18 Jun '14, 10:42

OR-Exchange! Your site for questions, answers, and announcements about operations research.