Meet's Notes

Search

Lecture 22 - Markov Decision Process III

Feb 10, 2024, 1 min read

#notes
#cs471

Policy Evaluation

Fixed Policies

Expectimax trees max over all actions to compute the optimal values
If we fixed some policy $π (s)$ , then tree would be simpler — only one action per state.
Recursive relation (one-step lookahead / Bellman Equation)

Value Iteration

Each iteration updates both values and policy
We don’t track the policy, but take max over actions which implicitly recomputes it

Policy Iteration

Several passes that update utilities with fixed policy
After policy is evaluated, a new policy is chosen.
The new policy will be better (if not, we are done)

Graph View

Backlinks

CS471