Lecture 24 - Reinforcement Learning II

#notes #cs471

Recap

Direct Evaluation

$s am pl e = R (s, π (s), s^{'}) + \underline{γ V^{π} (s^{'})}$

Temporal Difference Learning

Big Idea: Learn from every experience using running average
Factor in result state (running average)
$s am pl e = R (s, π (s), s^{'}) + \underline{γ V^{π} (s^{'})}$

Active Reinforcement Learning

Learner has to choose between exploitation and exploration

Learn utilities for state/actions
Compute optimal policy for current learned model

How to Explore?

Simplest: Random actions ( $ϵ$ -greedy)
- Every time step, flip a coin
  - With small probability $ϵ$ , act randomly
  - With large probability $1 - ϵ$ , act on current policy
- Problem?
  - Will keep acting randomly even once learning is done
  - Can reduce $ϵ$ over time
  - Can use exploration function

Exploration Functions

When to explore?

Random: explore a fixed amount
Better Idea: Explore areas whose values have not been established
Exploration Function
- Takes a estimate $u$ and a visit count $n$ , and returns a optimistic utility (k > 0), e.g. $f (u, n) = u + k / n$

Q-Learning Properties

Q-Learning converges to optimal policy even if acting sub-optimally

Regret

Even if you learn optimal policy, you will make mistakes along the way Regret: Total mistake cost. Difference between (expected) rewards including youthful suboptimality and optimal (expected) rewards.

Linear Value Functions

Can write q function for any state using few weights
Advantage: Experience summed up in a few powerful numbers
- $V (s) = w_{1} f_{1} (s) + w_{2} f_{2} (s) + .. + w_{n} f_{n} (s)$
Disadvantage: States may share features but are different in value

Approximate Q-Learning

Update w
- $w_{i} \leftarrow w_{i} + a * (d i ff) f_{i} (s, a)$

Meet's Notes

Table of Contents

Lecture 24 - Reinforcement Learning II - Q Learning

Recap

Direct Evaluation

Temporal Difference Learning

Active Reinforcement Learning

How to Explore?

Exploration Functions

Q-Learning Properties

Regret

Linear Value Functions

Approximate Q-Learning

Graph View

Backlinks

Meet's Notes

Table of Contents

Lecture 24 - Reinforcement Learning II - Q Learning

Recap §

Direct Evaluation §

Temporal Difference Learning §

Active Reinforcement Learning §

How to Explore? §

Exploration Functions §

Q-Learning Properties §

Regret §

Linear Value Functions §

Approximate Q-Learning §

Graph View

Backlinks

Recap

Direct Evaluation

Temporal Difference Learning

Active Reinforcement Learning

How to Explore?

Exploration Functions

Q-Learning Properties

Regret

Linear Value Functions

Approximate Q-Learning