Lecture 08 Alpha-Beta Pruning

#notes #cs471

Recap

Deterministic Games:

States: S, (start at $S_{0}$ )
Players: $P = {1... N}$ (take turns)
$T o M o v e (s) :$ Player whose turn it is to move in state $s$
$A c t i o n s (s) :$ Set of legal moves in state $s$
$R es u lt (s, a) :$ Transition Function, state resulting from taking action $a$ in state $s$
$I s T er mina l (s) :$ Terminal Test, true when game is over
Solution for a player is a ’policy‘.

Value of a state:

Best achievable outcome (utility) from a state.

Minimax Values

Consider opponent in game
Opponent attempts to minimize utility
Agent/Player wants to maximize utility

Alpha - Beta Pruning & Tree Pruning)

Minimax Pruning

Prune nodes that are ‘highest’, and keep minimum utility (on opponents side)
- Opponent will not let you get high utility, so no point exploring those nodes

Alpha-Beta Pruning

When computing MIN-VALUE at some node $n$
- Loop over $n$ ’s children
Let $a$ be the best value that MAX can get at any choice point along current path from the root
If $n$ becomes worse than $a$ , max will avoid it, so we can stop considering $n$ ’s other children.
MAX version is symmetric.

Implementation

$α :$ MAX’s best option on path to root $β :$ MIN’s best option on path to root.

Basically, avoid worse branches. If the opponent can get you in a state that is of worse quality than the worst state of a better branch, there is no point exploring that branch.

Resource Limits

Problem: In realistic games, we cannot search to the leaves
Solution: Depth-Limited Search
- Search only a limited depth in the tree.
- Replace terminal utilities with an evaulation function for non-terminal positions.
- Guarantee of optimal play is gone.

Heuristic Alpha-Beta Pruning

Heuristic MINIMAX
Idea: Treat nonterminal nodes as if they were terminal.
$E v a l (s) :$ Evaluation Function $S \to R$
$I s C u tO ff (s) :$ A cutoff test, true when stop searching.

Evaluation Functions

Desirable Properties
- Order terminal states in same way as true ‘utility’ function
- Strongly Correlated with actual minimax value of states
- Efficient to calculate
Typically use features — characteristics of game state that are correlated with winning.
Evaluation function combines features to produce score:
- $E v a l (s) = \sum_{i} w_{i} f_{i} (s) = w_{1} f_{1} (s) + w_{2} f_{2}) s_{+} ... + w_{n} f_{n} (s)$

Monte Carlo Search Tree

Instead of using heuristic evaluation function
- Estimate value by averaging utility over several simulations
Simulation chooses moves first for one player, than another, until a terminal position reached.
- Moves chosen randomly and according to some ‘playout policy’ (choose good moves)
Exploration (try nodes with few playouts) vs. Exploitation (try nodes that have done well in the past)

Meet's Notes

Table of Contents

Lecture 08 Alpha-Beta Pruning

Recap

Deterministic Games:

Value of a state:

Minimax Values

Alpha - Beta Pruning & Tree Pruning)

Minimax Pruning

Alpha-Beta Pruning

Implementation

Resource Limits

Heuristic Alpha-Beta Pruning

Evaluation Functions

Monte Carlo Search Tree

Graph View

Backlinks

Meet's Notes

Table of Contents

Lecture 08 Alpha-Beta Pruning

Recap §

Deterministic Games: §

Value of a state: §

Minimax Values §

Alpha - Beta Pruning & Tree Pruning) §

Minimax Pruning §

Alpha-Beta Pruning §

Implementation §

Resource Limits §

Heuristic Alpha-Beta Pruning §

Evaluation Functions §

Monte Carlo Search Tree §

Graph View

Backlinks

Recap

Deterministic Games:

Value of a state:

Minimax Values

Alpha - Beta Pruning & Tree Pruning)

Minimax Pruning

Alpha-Beta Pruning

Implementation

Resource Limits

Heuristic Alpha-Beta Pruning

Evaluation Functions

Monte Carlo Search Tree