Adversarial Search - II


CSE 440: Introduction to Artificial Intelligence

Vishnu Boddeti

September 29, 2020


Content Credits: CMU AI, http://ai.berkeley.edu

Multi-Agent Utilities

  • What if the game is non zero-sum, or has multiple players?
  • Generalization of minimax:
    • Terminals have utility tuples
    • Node values are also utility tuples
    • Each player maximizes its own component
    • Can give rise to cooperation and competition dynamically.....

Worst Case vs Average Case

  • Idea:Uncertain outcomes controlled by chance, not an adversary.

Stochastic Games

  • Why would we not know what the result of an action will be?
    • Explicit randomness: rolling dice
    • Unpredictable opponents: the ghosts respond randomly
    • Actions can fail: when moving a robot, wheels might slip
  • Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes

Expectimax Search

  • Expectimax search: compute the average score under optimal play
    • Max nodes as in minimax search
    • Chance nodes are like min nodes but the outcome is uncertain
    • Calculate their expected utilities
    • i.e., take weighted average (expectation) of children
  • Later, we will learn how to formalize the underlying uncertain-result problems as Markov Decision Processes

Expectimax Implementation

Expectimax Implementation

\begin{equation} v = \frac{1}{2}*8 + \frac{1}{3}*24 - \frac{1}{6}*12 = 10 \end{equation}

Expectimax Example

Expectimax Pruning

Depth-Limited Expectimax

Image Image

Primer: Probabilities

  • A random variable represents an event whose outcome is unknown.
  • A probability distribution is an assignment of weights to outcomes
  • Example: Traffic on freeway
    • Random variable: $T = \mbox{ whether there is traffic}$
    • Outcomes: $T \in \{none, mild, heavy\}$
    • Distribution: $P(T=none) = 0.25$, $P(T=mild) = 0.50$, $P(T=heavy) = 0.25$

Primer: Probabilities...

  • Some laws of probability (more later):
    • Probabilities are always non-negative
    • Probabilities over all possible outcomes sum to one
  • As we get more evidence, probabilities may change:
    • $P(T=heavy) = 0.25$, $P(T=heavy \mid Hour=8am) = 0.60$
    • We will talk about methods for reasoning and updating probabilities later

Primer: Expectations

  • The expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes
  • Example: How long to get to the airport?

What probabilities to use?

  • In expectimax search, we have a probabilistic model of how the opponent (or environment) will behave in any state
    • Model could be a simple uniform distribution (roll a die)
    • Model could be sophisticated and require a great deal of computation
    • We have a chance node for any outcome out of our control: opponent or environment
    • The model might say that adversarial actions are likely.
  • For now, assume each chance node magically comes along with probabilities that specify the distribution over its outcomes

Quiz: Informed Probabilities

  • Let us say you know that your opponent is actually running a depth 2 minimax, using the result 80% of the time, and moving randomly otherwise.
  • Question: What tree search should you use?
  • Answer: Expectimax
    • To figure out EACH chance node’s probabilities, you have to run a simulation of your opponent
    • This kind of thing gets very slow very quickly
    • Even worse if you have to simulate your opponent simulating you
    • Except for minimax, which has the nice property that it all collapses into one game tree

The Dangers of Optimism and Pessimism

  • Dangerous Optimism
    • Assuming chance when the world is adversarial

Assumptions vs Reality

Adversarial Ghost Random Ghost
Minimax Pacman
  • Won 5/5
  • Avg. Score: 483
  • Won 5/5
  • Avg. Score: 493
Expectimax Pacman
  • Won 1/5
  • Avg. Score: -303
  • Won 5/5
  • Avg. Score: 503

Mixed Agent Games

  • E.g. Backgammon
  • Expectiminimax
    • Environment is an extra "random agent" player that moves after each min/max agent
    • Each node computes the appropriate combination of its children

Mixed Agent Games: Backgammon

  • Dice rolls increase b: 21 possible rolls with 2 dice
    • $\approx$ 20 legal moves
    • Depth $2 = 20 \times (21 \times 20)^3 = 1.2 \times 10^9$
  • As depth increases, probability of reaching a given search node shrinks
    • So usefulness of search is diminished
    • So limiting depth is less damaging
    • But pruning is trickier
  • Historic AI: TDGammon uses depth-2 search + very good evaluation function + reinforcement learning: world-champion level play
  • 1st AI world champion in any game !!

Maximum Expected Utility

  • Why should we average utilities? Why not minimax?
  • Principle of maximum expected utility:
    • A rational agent should chose the action that maximizes its expected utility, given its knowledge
  • Questions:
    • Where do utilities come from?
    • How do we know such utilities even exist?
    • How do we know that averaging even makes sense?
    • What if our behavior (preferences) cannot be described by utilities?

What utilities to use?

  • For worst-case minimax reasoning, terminal function scale does not matter
    • We just want better states to have higher evaluations (get the ordering right)
    • We call this insensitivity to monotonic transformations.
  • For average-case expectimax reasoning, we need magnitudes to be meaningful

Utilities

  • Utilities are functions from outcomes (states of the world) to real numbers that describe an agent's preferences
  • Where do utilities come from?
    • In a game, may be simple $(+1/-1)$
    • Utilities summarize the agent's goals
    • Theorem: any "rational" preferences can be summarized as a utility function
  • We hard-wire utilities and let behaviors emerge
    • Why don't we let agents pick utilities?
    • Why don't we prescribe behaviors?

Preferences

  • An agent must have preferences among:
    • Prizes: $A$, $B$, etc.
    • Lotteries: situations with uncertain prizes
    • $$\begin{equation} L = [p,A; (1-p), B] \nonumber \end{equation}$$
  • Notation:
    • Preference: $ A \succ B$
    • Indifference: $ A \sim B$