#notes#cs471

Recap: Defining MDP

  • Markov Decision Process:
    • Set of states S
    • Start state
    • Set of actions A
    • Transitions
    • Rewards (and discount )
  • Policy = Choice of action for each state
  • Utility = Sum of (discounted) rewards

Convergence

  • Claim: Value Iteration Converges
  • Notation:
    • , a vector of size , where denotes set of states
    • where operator B denotes an iteration of Bellman Update