Recap: Defining MDP
- Markov Decision Process:
- Set of states S
- Start state
- Set of actions A
- Transitions
- Rewards (and discount )
- Policy = Choice of action for each state
- Utility = Sum of (discounted) rewards
Convergence
- Claim: Value Iteration Converges
- Notation:
- , a vector of size , where denotes set of states
- where operator B denotes an iteration of Bellman Update