#notes#cs471

Recap

Deterministic Games:

  • States: S, (start at )
  • Players: (take turns)
  • Player whose turn it is to move in state
  • Set of legal moves in state
  • Transition Function, state resulting from taking action in state
  • Terminal Test, true when game is over
  • Solution for a player is a ’policy‘.

Value of a state:

  • Best achievable outcome (utility) from a state.

Minimax Values

  • Consider opponent in game
  • Opponent attempts to minimize utility
  • Agent/Player wants to maximize utility

Alpha - Beta Pruning & Tree Pruning)

Minimax Pruning

  • Prune nodes that are ‘highest’, and keep minimum utility (on opponents side)
    • Opponent will not let you get high utility, so no point exploring those nodes

Alpha-Beta Pruning

  • When computing MIN-VALUE at some node
    • Loop over ’s children
  • Let be the best value that MAX can get at any choice point along current path from the root
  • If becomes worse than , max will avoid it, so we can stop considering ’s other children.
  • MAX version is symmetric.
Implementation

MAX’s best option on path to root MIN’s best option on path to root.

Basically, avoid worse branches. If the opponent can get you in a state that is of worse quality than the worst state of a better branch, there is no point exploring that branch.

Resource Limits

  • Problem: In realistic games, we cannot search to the leaves
  • Solution: Depth-Limited Search
    • Search only a limited depth in the tree.
    • Replace terminal utilities with an evaulation function for non-terminal positions.
    • Guarantee of optimal play is gone.

Heuristic Alpha-Beta Pruning

  • Heuristic MINIMAX
  • Idea: Treat nonterminal nodes as if they were terminal.
  • Evaluation Function
  • A cutoff test, true when stop searching.

Evaluation Functions

  • Desirable Properties
    • Order terminal states in same way as true ‘utility’ function
    • Strongly Correlated with actual minimax value of states
    • Efficient to calculate
  • Typically use features — characteristics of game state that are correlated with winning.
  • Evaluation function combines features to produce score:

Monte Carlo Search Tree

  • Instead of using heuristic evaluation function
    • Estimate value by averaging utility over several simulations
  • Simulation chooses moves first for one player, than another, until a terminal position reached.
    • Moves chosen randomly and according to some ‘playout policy’ (choose good moves)
  • Exploration (try nodes with few playouts) vs. Exploitation (try nodes that have done well in the past)