Adversarial Search - II
CSE 440: Introduction to Artificial Intelligence
Vishnu Boddeti
September 29, 2020
Content Credits: CMU AI, http://ai.berkeley.edu
Multi-Agent Utilities
- What if the game is non zero-sum, or has multiple players?
- Generalization of minimax:
- Terminals have utility tuples
- Node values are also utility tuples
- Each player maximizes its own component
- Can give rise to cooperation and competition dynamically.....
Worst Case vs Average Case
- Idea:Uncertain outcomes controlled by chance, not an adversary.
Stochastic Games
- Why would we not know what the result of an action will be?
- Explicit randomness: rolling dice
- Unpredictable opponents: the ghosts respond randomly
- Actions can fail: when moving a robot, wheels might slip
- Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes
Expectimax Search
- Expectimax search: compute the average score under optimal play
- Max nodes as in minimax search
- Chance nodes are like min nodes but the outcome is uncertain
- Calculate their expected utilities
- i.e., take weighted average (expectation) of children
- Later, we will learn how to formalize the underlying uncertain-result problems as Markov Decision Processes
Expectimax Implementation
Expectimax Implementation
\begin{equation}
v = \frac{1}{2}*8 + \frac{1}{3}*24 - \frac{1}{6}*12 = 10
\end{equation}
Expectimax Example
Expectimax Pruning
Primer: Probabilities
- A random variable represents an event whose outcome is unknown.
- A probability distribution is an assignment of weights to outcomes
- Example: Traffic on freeway
- Random variable: $T = \mbox{ whether there is traffic}$
- Outcomes: $T \in \{none, mild, heavy\}$
- Distribution: $P(T=none) = 0.25$, $P(T=mild) = 0.50$, $P(T=heavy) = 0.25$
Primer: Probabilities...
- Some laws of probability (more later):
- Probabilities are always non-negative
- Probabilities over all possible outcomes sum to one
- As we get more evidence, probabilities may change:
- $P(T=heavy) = 0.25$, $P(T=heavy \mid Hour=8am) = 0.60$
- We will talk about methods for reasoning and updating probabilities later
Primer: Expectations
- The expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes
- Example: How long to get to the airport?
What probabilities to use?
- In expectimax search, we have a probabilistic model of how the opponent (or environment) will behave in any state
- Model could be a simple uniform distribution (roll a die)
- Model could be sophisticated and require a great deal of computation
- We have a chance node for any outcome out of our control: opponent or environment
- The model might say that adversarial actions are likely.
- For now, assume each chance node magically comes along with probabilities that specify the distribution over its outcomes
Quiz: Informed Probabilities
- Let us say you know that your opponent is actually running a depth 2 minimax, using the result 80% of the time, and moving randomly otherwise.
- Question: What tree search should you use?
- Answer: Expectimax
- To figure out EACH chance node’s probabilities, you have to run a simulation of your opponent
- This kind of thing gets very slow very quickly
- Even worse if you have to simulate your opponent simulating you
- Except for minimax, which has the nice property that it all collapses into one game tree
The Dangers of Optimism and Pessimism
- Dangerous Optimism
- Assuming chance when the world is adversarial
- Dangerous Pessimism
- Assuming the worst case when it is not likely
Assumptions vs Reality
- Pacman used depth 4 search with an eval function that avoids trouble
- Ghost used depth 2 search with an eval function that seeks Pacman
|
Adversarial Ghost |
Random Ghost |
Minimax Pacman |
|
|
Expectimax Pacman |
|
|
Mixed Agent Games
- E.g. Backgammon
- Expectiminimax
- Environment is an extra "random agent" player that moves after each min/max agent
- Each node computes the appropriate combination of its children
Mixed Agent Games: Backgammon
- Dice rolls increase b: 21 possible rolls with 2 dice
- $\approx$ 20 legal moves
- Depth $2 = 20 \times (21 \times 20)^3 = 1.2 \times 10^9$
- As depth increases, probability of reaching a given search node shrinks
- So usefulness of search is diminished
- So limiting depth is less damaging
- But pruning is trickier
- Historic AI: TDGammon uses depth-2 search + very good evaluation function + reinforcement learning: world-champion level play
- 1st AI world champion in any game !!
Maximum Expected Utility
- Why should we average utilities? Why not minimax?
- Principle of maximum expected utility:
- A rational agent should chose the action that maximizes its expected utility, given its knowledge
- Questions:
- Where do utilities come from?
- How do we know such utilities even exist?
- How do we know that averaging even makes sense?
- What if our behavior (preferences) cannot be described by utilities?
What utilities to use?
- For worst-case minimax reasoning, terminal function scale does not matter
- We just want better states to have higher evaluations (get the ordering right)
- We call this insensitivity to monotonic transformations.
- For average-case expectimax reasoning, we need magnitudes to be meaningful
Utilities
- Utilities are functions from outcomes (states of the world) to real numbers that describe an agent's preferences
- Where do utilities come from?
- In a game, may be simple $(+1/-1)$
- Utilities summarize the agent's goals
- Theorem: any "rational" preferences can be summarized as a utility function
- We hard-wire utilities and let behaviors emerge
- Why don't we let agents pick utilities?
- Why don't we prescribe behaviors?
Preferences
- An agent must have preferences among:
- Prizes: $A$, $B$, etc.
- Lotteries: situations with uncertain prizes
$$\begin{equation}
L = [p,A; (1-p), B] \nonumber
\end{equation}$$
- Notation:
- Preference: $ A \succ B$
- Indifference: $ A \sim B$