#notes#cs471

Policy Evaluation

Fixed Policies

  • Expectimax trees max over all actions to compute the optimal values
  • If we fixed some policy , then tree would be simpler — only one action per state.
  • Recursive relation (one-step lookahead / Bellman Equation)

Value Iteration

  • Each iteration updates both values and policy
  • We don’t track the policy, but take max over actions which implicitly recomputes it

Policy Iteration

  • Several passes that update utilities with fixed policy
  • After policy is evaluated, a new policy is chosen.
  • The new policy will be better (if not, we are done)