Meet's Notes

Search

Recap: Defining MDP
Convergence

Lecture 21 - Markov Decision Process - II

Feb 10, 2024, 1 min read

#notes
#cs471

Recap: Defining MDP

Markov Decision Process:
- Set of states S
- Start state $s_{0}$
- Set of actions A
- Transitions $P (s^{'} ∣ s, a)$
- Rewards $R (s, a, s^{'})$ (and discount $y$ )
Policy $π$ = Choice of action for each state
Utility = Sum of (discounted) rewards

Convergence

Claim: Value Iteration Converges
Notation:
- $V_{k} \in R^{∣ S ∣}$ , a vector of size $∣ S ∣$ , where $S$ denotes set of states
- $V_{k + 1} = B V_{k}$ where operator B denotes an iteration of Bellman Update
- $∣∣ V_{k} ∣ ∣_{\infty} = ma x_{s} V_{k} (s)$

Graph View

Backlinks

CS471