Probability
CSE 440: Introduction to Artificial Intelligence
Vishnu Boddeti
November 12, 2020
Content Credits: CMU AI, http://ai.berkeley.edu
Today
- Random Variables
- Joint and Marginal Probabilities
- Conditional Probabilities
- Product Rule, Chain Rule, Bayes Rule
- Inference
- Independence
Inference in Ghostbusters
- A ghost is in the grid somewhere
- Sensor readings tell how close a square is to the ghost
- On the ghost: red
- 1 or 2 away: orange
- 3 or 4 away: yellow
- 5+ away: green
Sensors are noisy, but we know P(color|distance)
pred(red|3) |
pred(orange|3) |
pred(yellow|3) |
pred(green|3) |
0.05 |
0.15 |
0.50 |
0.30 |
Uncertainty
- General situation:
- Observed variables (evidence): Agent knows certain things about the state of the world (e.g., sensor readings or symptoms)
- Unobserved variables: Agent needs to reason about other aspects (e.g. where an object is or what disease is present)
- Model: Agent knows something about how the known variables relate to the unknown variables
- Probabilistic reasoning gives us a framework for managing our beliefs and knowledge
0.11 |
0.11 |
0.11 |
0.11 |
0.11 |
0.11 |
0.11 |
0.11 |
0.11 |
0.17 |
0.10 |
0.10 |
0.09 |
0.17 |
0.10 |
0.01 |
0.09 |
0.17 |
0.01 |
0.01 |
0.03 |
0.01 |
0.05 |
0.05 |
0.01 |
0.05 |
0.81 |
Random Variables
- A random variable is some aspect of the world about which we (may) have uncertainty
- R= Is it raining?
- T= Is it hot or cold?
- D= How long will it take to drive to work?
- L= Where is the ghost?
- We denote random variables with capital letters
- Like variables in a CSP, random variables have domains
- R∈{false,true} (often written as {−r,+r})
- T∈{hot,cold}
- D∈[0,∞)
- L∈ possible locations, maybe {(0,0),(0,1),…}
Probability Distributions
- Associate a probability with each value
Weather P(W)
W |
P |
sun |
0.6 |
rain |
0.1 |
fog |
0.3 |
meteor |
0.0 |
Probability Distributions
- Unobserved random variables have distributions
- A distribution is a TABLE of probabilities of values
- A probability (lower case value) is a single number P(W=rain)=0.1
- Must have: ∀x P(X=x)≥0 and ∑xP(X=x)=1
Shorthand Notation:
P(hot)=P(T=hot)P(cold)=P(T=cold)P(rain)=P(W=rain)
OK, if all domains entries are unique
Joint Distributions
- A joint distribution over a set of random variables: X1,X2,…,Xn
- Joint distribution specifies a real number for each assignment (or outcome):
P(X1,…,Xn)P(X1=x1,…,Xn=xn)
- Must obey:
P(x1,x2,…,xn)≥0∑x1,…,xnP(x1,x2,…,xn)=1
- Size of distribution if n variables with domain sizes d?
- For all but the smallest distributions, impractical to write out !!
T |
W |
P |
hot |
sun |
0.4 |
hot |
rain |
0.1 |
cold |
sun |
0.2 |
cold |
rain |
0.3 |
Probabilistic Models
- A probabilistic model is a joint distribution over a set of random variables
- Probabilistic models:
- (Random) variables with domains
- Assignments are called outcomes
- Joint distributions: say whether assignments (outcomes) are likely
- Normalized: sum to 1.0
- Ideally: only certain variables directly interact
- Constraint satisfaction problems:
- Variables with domains
- Constraints: state whether assignments are possible
- Ideally: only certain variables directly interact
T |
W |
P |
hot |
sun |
0.4 |
hot |
rain |
0.1 |
cold |
sun |
0.2 |
cold |
rain |
0.3 |
T |
W |
P |
hot |
sun |
T |
hot |
rain |
F |
cold |
sun |
F |
cold |
rain |
T |
Events
- An event is a set E of outcomes:
P(E)=∑(x1,…,xn)∈EP(x1,…,xn)
- From a joint distribution, we can calculate the probability of any event
- Probability that it is hot AND sunny?
- Probability that it is hot?
- Probability that it is hot OR sunny?
- Typically, the events we care about are partial assignments, like P(T=hot)
T |
W |
P |
hot |
sun |
0.4 |
hot |
rain |
0.1 |
cold |
sun |
0.2 |
cold |
rain |
0.3 |
Quiz: Events
- P(+x,+y)
- P(+x)
- P(−y OR +x)
X |
Y |
P |
+x |
+y |
0.4 |
+x |
-y |
0.1 |
-x |
+y |
0.2 |
-x |
-y |
0.3 |
Marginal Distributions
- Marginal distributions are sub-tables which eliminate variables
- Marginalization (summing out): Combine collapsed rows by adding
T |
W |
P |
hot |
sun |
0.4 |
hot |
rain |
0.1 |
cold |
sun |
0.2 |
cold |
rain |
0.3 |
Quiz: Marginal Distributions
X |
Y |
P |
+x |
+y |
0.2 |
+x |
-y |
0.3 |
-x |
+y |
0.4 |
-x |
-y |
0.1 |
Conditional Distributions
- A simple relation between joint and conditional probabilities
- In fact, this is taken as the definition of a conditional probability
P(a|b)=P(a,b)P(b)
T |
W |
P |
hot |
sun |
0.4 |
hot |
rain |
0.1 |
cold |
sun |
0.2 |
cold |
rain |
0.3 |
P(W=s|T=c)=P(W=s,T=c)P(T=c)=P(W=s,T=c)P(W=s,T=c)+P(W=r,T=c)=0.20.2+0.3=0.4
Quiz: Conditional Distributions
P(X,Y)
X |
Y |
P |
+x |
+y |
0.2 |
+x |
-y |
0.3 |
-x |
+y |
0.4 |
-x |
-y |
0.1 |
- P(+x|+y)
- P(−x|+y)
- P(−y|+x)
Conditional Distributions
- Conditional distributions are probability distributions over some variables given fixed values of others
T |
W |
P |
hot |
sun |
0.4 |
hot |
rain |
0.1 |
cold |
sun |
0.2 |
cold |
rain |
0.3 |
Normalization Tricks
T |
W |
P |
hot |
sun |
0.4 |
hot |
rain |
0.1 |
cold |
sun |
0.2 |
cold |
rain |
0.3 |
Normalization Tricks
- Why does this work?
- Sum of selection is P(evidence)!
- P(T=c) in this case
- In general:
P(x1|x2)=P(x1,x2)P(x2)=P(x1,x2)∑x1P(x1,x2)
Quiz: Normalization Tricks
To Normalize
- (Dictionary) To bring or restore to a normal condition
- Procedure:
- Step 1: Compute Z= sum over all entries
- Step 2: Divide every entry by Z
Probabilistic Inference
- Probabilistic inference: compute a desired probability from other known probabilities (e.g. conditional from joint)
- We generally compute conditional probabilities
- P(on time|no reported accidents)=0.90
- These represent the agent's beliefs given the evidence
- Probabilities change with new evidence:
- P(on time|no accidents, 5 a.m.)=0.95
- P(on time|no accidents, 5 a.m., raining)=0.80
- Observing new evidence causes beliefs to be updated
Inference by Enumeration
- Problem Setup:
- General Case:
- Evidence variables: E1,…,Ek=e1,…,ek
- Query variables: Q
- Hidden variables: H1,…,Hr
- We want: P(Q|e1…,ek)
- Solution:
- Step 1: Select the entries consistent with the evidence
- Step 2: Sum out H to get joint of Query and evidence
P(Q,e1…,ek)=∑h1,…,hrP(Q,h1,…,hr,e1,…,ek)
- Step 3: Normalize
Z=P(Q,e1,…,er)
P(Q|e1…,er)=1ZP(Q,e1,…,er)
Inference by Enumeration
- P(W)?
- P(W|winter)?
- P(W|winter,hot)?
summer |
hot |
sun |
0.30 |
summer |
hot |
rain |
0.05 |
summer |
cold |
sun |
0.10 |
summer |
cold |
rain |
0.05 |
winter |
hot |
sun |
0.10 |
winter |
hot |
rain |
0.05 |
winter |
cold |
sun |
0.15 |
winter |
cold |
rain |
0.20 |
Inference by Enumeration
- Obvious problems:
- Worst-case time complexity O(dn)
- Space complexity O(dn) to store the joint distribution
Product Rule
- Sometimes we have conditional distributions but want the joint
P(y)P(x|y)=P(x,y)⇒P(x|y)=P(x,y)P(y)
Product Rule
P(y)P(x|y)=P(x,y)
D |
W |
P |
wet |
sun |
0.1 |
dry |
sun |
0.9 |
wet |
rain |
0.7 |
dry |
rain |
0.3 |
D |
W |
P |
wet |
sun |
|
dry |
sun |
|
wet |
rain |
|
dry |
rain |
|
Chain Rule
- More generally, can always write any joint distribution as an incremental product of conditional distributions
P(x1,x2,x3)=P(x1)P(x2|x1)P(x3|x2,x1)P(x1,x2,…,xn)=∏iP(xi|x1,…,xn)
- This is always true. Why?
Bayes Rule
- Two ways to factor a joint distribution over two variables:
P(x,y)=P(x|y)P(y)=P(y|x)P(x)
- Dividing, we get:
P(x|y)=P(y|x)P(x)P(y)
- Why is this at all helpful?
- Lets us build one conditional from its reverse
- Often one conditional is tricky but the other one is simple
- Foundation of many systems we’ll see later.
- In the running for most important AI equation.
Inference with Bayes Rule
- Example: Diagnostic probability from causal probability:
P(cause|effect)=P(effect|cause)P(cause)P(effect)
- Example:
- M: meningitis, S: stiff neck
P(+m)=0.0001P(+s|+m)=0.8P(+s|−m)=0.01
P(+m|+s)=P(+s|+m)P(+m)P(+s)=P(+s|+m)P(+m)P(+s|+m)P(+m)+P(+s|−m)P(−m)=0.8×0.00010.8×0.0001+0.01×0.9999
- Note: posterior probability of meningitis still very small
- Note: you should still get stiff necks checked out !! Why?
Quiz: Bayes Rule
D |
W |
P |
wet |
sun |
0.1 |
dry |
sun |
0.9 |
wet |
rain |
0.7 |
dry |
rain |
0.3 |
Ghostbusters Revisited
- Let us say we have two distributions:
- Prior distribution over ghost location: P(G)
- Let us say this is uniform
- Sensor reading model: P(R|G)
- Given: we know what our sensors do
- R= reading color measured at (1,1)
- e.g. P(R=yellow|G=(1,1))=0.1
- We can calculate the posterior distribution P(G|r) over ghost locations given a reading using Bayes' rule:
P(g|r)∝P(r|g)P(g)
Probability CSE 440: Introduction to Artificial Intelligence Vishnu Boddeti November 12, 2020 Content Credits: CMU AI, http://ai.berkeley.edu