Suppose for a POMDP, we have calculated a deterministic optimal policy (i.e. mapping from belief states to action). Now, I want to apply it. So, I have an observation and I want to choose the optimal action. What should I do ? How shall I find belief state vector from the observation.

asked 16 Jun '14, 02:10

sosha's gravatar image

sosha
100137
accept rate: 0%


Your belief state vector needs to be updated every time we get an observation. As you mentioned before it is simple process.

Initial belief is equally distributed according to the first observation. and using action and observation, we can improve our belief (probability vector of our current state).

If you read the reference I told you before (http://www.ai.mit.edu/courses/6.825/pdf/pomdp.pdf) You can see the simple example of belief updating. (Room hopping problem)

link

answered 18 Jun '14, 10:42

ksphil's gravatar image

ksphil
66717
accept rate: 14%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×58

Asked: 16 Jun '14, 02:10

Seen: 780 times

Last updated: 18 Jun '14, 10:42

OR-Exchange! Your site for questions, answers, and announcements about operations research.