Probability


CSE 440: Introduction to Artificial Intelligence

Vishnu Boddeti


Content Credits: CMU AI, http://ai.berkeley.edu

Today

  • Random Variables
  • Joint and Marginal Probabilities
  • Conditional Probabilities
  • Product Rule, Chain Rule, Bayes Rule

Inference in Ghostbusters

  • A ghost is in the grid somewhere
  • Sensor readings tell how close a square is to the ghost
    • On the ghost: red
    • 1 or 2 away: orange
    • 3 or 4 away: yellow
    • 5+ away: green
Sensors are noisy, but we know $P(color|distance)$
$pred(red|3)$ $pred(orange|3)$ $pred(yellow|3)$ $pred(green|3)$
0.05 0.15 0.50 0.30

Uncertainty

  • General situation:
    • Observed variables (evidence): Agent knows certain things about the state of the world (e.g., sensor readings or symptoms)
    • Unobserved variables: Agent needs to reason about other aspects (e.g. where an object is or what disease is present)
    • Model: Agent knows something about how the known variables relate to the unknown variables
  • Probabilistic reasoning gives us a framework for managing our beliefs and knowledge

Random Variables

  • A random variable is some aspect of the world about which we (may) have uncertainty
    • $R =$ Is it raining?
    • $T =$ Is it hot or cold?
    • $D =$ How long will it take to drive to work?
    • $L =$ Where is the ghost?
  • We denote random variables with capital letters
  • Like variables in a CSP, random variables have domains
    • $R \in \{false, true\}$ (often written as $\{-r, +r\}$)
    • $T \in \{hot, cold\}$
    • $D \in [0, \infty)$
    • $L \in \mbox{ possible locations}$, maybe $\{(0,0), (0,1), \dots\}$

Probability Distributions

  • Associate a probability with each value


Temperature $P(T)$
$T$ $P$
hot 0.5
cold 0.5

Probability Distributions

  • Unobserved random variables have distributions
  • A distribution is a TABLE of probabilities of values
  • A probability (lower case value) is a single number $P(W=rain)=0.1$
  • Must have: \[\forall x \mbox{ } P(X=x) \geq 0\] and \[\sum_x P(X=x)=1\]

Joint Distributions

  • A joint distribution over a set of random variables: $X_1,X_2,\dots,X_n$
  • Joint distribution specifies a real number for each assignment (or outcome): \begin{equation} \begin{aligned} & P(X_1,\dots,X_n)\nonumber \\ & P(X_1=x_1,\dots,X_n=x_n)\nonumber \end{aligned} \end{equation}
    • Must obey: \begin{equation} \begin{aligned} & P(x_1,x_2,\dots,x_n) \geq 0 \nonumber \\ & \sum_{x_1,\dots,x_n} P(x_1,x_2,\dots,x_n) = 1 \nonumber \end{aligned} \end{equation}
  • Size of distribution if n variables with domain sizes d?
    • For all but the smallest distributions, impractical to write out !!

Probabilistic Models

  • A probabilistic model is a joint distribution over a set of random variables

  • Probabilistic models:
    • (Random) variables with domains
    • Assignments are called outcomes
    • Joint distributions: say whether assignments (outcomes) are likely
    • Normalized: sum to 1.0
    • Ideally: only certain variables directly interact

  • Constraint satisfaction problems:
    • Variables with domains
    • Constraints: state whether assignments are possible
    • Ideally: only certain variables directly interact

Events

  • An event is a set E of outcomes:
  • \begin{equation} P(E) = \sum_{(x_1,\dots,x_n)\in E} P(x_1,\dots,x_n) \nonumber \end{equation}
  • From a joint distribution, we can calculate the probability of any event
    • Probability that it is hot AND sunny?
    • Probability that it is hot?
    • Probability that it is hot OR sunny?
  • Typically, the events we care about are partial assignments, like $P(T=hot)$

Marginal Distributions

  • Marginal distributions are sub-tables which eliminate variables
  • Marginalization (summing out): Combine collapsed rows by adding


$T$ $W$ $P$
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

Conditional Distributions

  • A simple relation between joint and conditional probabilities
    • In fact, this is taken as the definition of a conditional probability
\begin{equation} P(a|b) = \frac{P(a,b)}{P(b)} \nonumber \end{equation}
$T$ $W$ $P$
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

\begin{equation} P(W=s|T=c) = \frac{P(W=s,T=c)}{P(T=c)} = \frac{P(W=s,T=c)}{P(W=s,T=c)+P(W=r,T=c)} = \frac{0.2}{0.2+0.3} = 0.4 \nonumber \end{equation}

Conditional Distributions

  • Conditional distributions are probability distributions over some variables given fixed values of others
$P(W|T=hot)$
$W$ $P$
sun 0.8
rain 0.2

$P(W|T=cold)$
$W$ $P$
sun 0.4
rain 0.6

Normalization Tricks



$T$ $W$ $P$
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

Normalization Tricks

  • Why does this work?
    • Sum of selection is P(evidence)!
    • $P(T=c)$ in this case
  • In general: \begin{equation} P(x_1|x_2) = \frac{P(x_1,x_2)}{P(x_2)} = \frac{P(x_1,x_2)}{\sum_{x_1}P(x_1,x_2)} \nonumber \end{equation}

To Normalize

  • (Dictionary) To bring or restore to a normal condition
  • Procedure:
    • Step 1: Compute $Z =$ sum over all entries
    • Step 2: Divide every entry by $Z$

Probabilistic Inference

  • Probabilistic inference: compute a desired probability from other known probabilities (e.g. conditional from joint)

  • We generally compute conditional probabilities
    • $P(\mbox{on time}|\mbox{no reported accidents}) = 0.90$
    • These represent the agent's beliefs given the evidence

  • Probabilities change with new evidence:
    • $P(\mbox{on time}|\mbox{no accidents, 5 a.m.}) = 0.95$
    • $P(\mbox{on time}|\mbox{no accidents, 5 a.m., raining}) = 0.80$
    • Observing new evidence causes beliefs to be updated

Inference by Enumeration

  • Problem Setup:
    • General Case:
      • Evidence variables: $E_1,\dots,E_k=e_1,\dots,e_k$
      • Query variables: $Q$
      • Hidden variables: $H_1,\dots,H_r$
    • We want: $P(Q|e_1\dots,e_k)$
  • Solution:
    • Step 1: Select the entries consistent with the evidence
    • Step 2: Sum out $H$ to get joint of Query and evidence \begin{equation}P(Q,e_1\dots,e_k)=\sum_{h_1,\dots,h_r}P(Q,h_1,\dots,h_r,e_1,\dots,e_k)\end{equation}
    • Step 3: Normalize \[Z = P(Q,e_1,\dots,e_r)\] \[P(Q|e_1\dots,e_r) = \frac{1}{Z}P(Q,e_1,\dots,e_r)\]

Inference by Enumeration

  • $P(W)$?


  • $P(W|winter)$?


  • $P(W|winter,hot)$?

Inference by Enumeration

  • Obvious problems:
    • Worst-case time complexity $\mathcal{O}(d^n)$
    • Space complexity $\mathcal{O}(d^n)$ to store the joint distribution

Product Rule

  • Sometimes we have conditional distributions but want the joint
\begin{equation} \begin{aligned} P(y)P(x|y) &= P(x,y) \nonumber \\ \Rightarrow P(x|y) &= \frac{P(x,y)}{P(y)} \nonumber \\ \end{aligned} \end{equation}

Product Rule

\begin{equation} P(y)P(x|y) = P(x,y) \nonumber \end{equation}
  • Example:
$W$ $P$
sun 0.8
rain 0.2
$D$ $W$ $P$
wet sun 0.1
dry sun 0.9
wet rain 0.7
dry rain 0.3

Chain Rule

  • More generally, can always write any joint distribution as an incremental product of conditional distributions
  • \begin{equation} \begin{aligned} P(x_1,x_2,x_3) &= P(x_1)P(x_2|x_1)P(x_3|x_2,x_1) \nonumber \\ P(x_1,x_2,\dots,x_n) &= \prod_{i}P(x_i|x_1,\dots,x_n) \nonumber \end{aligned} \end{equation}
  • This is always true. Why?

Bayes Rule

Bayes Rule

  • Two ways to factor a joint distribution over two variables:
  • \begin{equation} P(x,y) = P(x|y)P(y) = P(y|x)P(x) \nonumber \end{equation}
  • Dividing, we get: \begin{equation} P(x|y) = \frac{P(y|x)P(x)}{P(y)} \nonumber \end{equation}
  • Why is this at all helpful?
    • Lets us build one conditional from its reverse
    • Often one conditional is tricky but the other one is simple
    • Foundation of many systems we’ll see later.
  • In the running for most important AI equation.

Inference with Bayes Rule

  • Example: Diagnostic probability from causal probability:
  • \begin{equation} P(cause|effect) = \frac{P(effect|cause)P(cause)}{P(effect)} \nonumber \end{equation}
  • Example:
    • M: meningitis, S: stiff neck $$\begin{equation} \begin{aligned} P(+m) &= 0.0001 \nonumber \\ P(+s|+m) &= 0.8 \nonumber \\ P(+s|-m) &= 0.01 \nonumber \\ \end{aligned} \end{equation}$$
    • $$\begin{equation} \begin{aligned} P(+m|+s) &= \frac{P(+s|+m)P(+m)}{P(+s)} \nonumber \\ &= \frac{P(+s|+m)P(+m)}{P(+s|+m)P(+m)+P(+s|-m)P(-m)} \nonumber \\ &= \frac{0.8\times 0.0001}{0.8\times 0.0001+0.01\times 0.9999} \nonumber \end{aligned} \end{equation}$$
    • Note: posterior probability of meningitis still very small
    • Note: you should still get stiff necks checked out !! Why?

Ghostbusters Revisited

  • Let us say we have two distributions:
    • Prior distribution over ghost location: $P(G)$
      • Let us say this is uniform
    • Sensor reading model: $P(R|G)$
      • Given: we know what our sensors do
      • $R =$ reading color measured at $(1,1)$
      • e.g. $P(R=yellow|G=(1,1)) = 0.1$
  • We can calculate the posterior distribution $P(G|r)$ over ghost locations given a reading using Bayes' rule: \begin{equation} P(g|r) \propto P(r|g)P(g) \nonumber \end{equation}

Q & A

Image
XKCD