Measuring and Mitigating Bias in AI

DLAI8

Slides: hal.cse.msu.edu/talks VishnuBoddeti

Progress In Artificial Intelligence

Speech Processing

Image Analysis

Natural Language Processing

Physical Sciences

Key Drivers Data, Compute, Algorithms

State-of-Affairs

(report from the real-world)

"Tay, Microsoft's AI chatbot, gets a crash course in racism from Twitter"

March 24, 2016

"Machine Bias"

May 23, 2016

"FaceApp's creator apologizes for the app's skin-lightening 'hot' filter"

April 25, 2017

"Facial recognition is accurate, if you're a white guy"

Feb. 09, 2018

lighter faces: 0.7% error

darker faces: 12.9% error

Boulamwini and Gebru, "Gender Shades:Intersectional Accuracy Disparities in Commercial Gender Classification," FAT 2018

"Black Artists Say A.I. Shows Bias"

July 04, 2023

"How AI reduces the world to stereotypes"

October 10, 2023

"LLMs propagate race-based medicine"

October 20, 2023

Real world machine learning systems are effective but,

are biased,

violate user’s privacy and

not trustworthy.

Research Questions

Measure bias in AI models.

Mitigate bias in AI models.

Measuring Bias in AI

Measuring Bias in Datasets

How about Data?

DataComp: In search of the next generation of multimodal datasets, NeurIPS D&B 2023

Measuring Hate Content in Text

Birhane, Prabhu, Han and Boddeti, "On Hate Scaling Laws For Data-Swamps," arXiv:2306.13141
Birhane, Prabhu, Han, Boddeti, Luccioni, "Into the LAION's Den: Investigating Hate in Multimodal Datasets," NeurIPS D&B Track 2023

Measuring Hate Content in Text

Birhane, Prabhu, Han and Boddeti, "On Hate Scaling Laws For Data-Swamps," arXiv:2306.13141
Birhane, Prabhu, Han, Boddeti, Luccioni, "Into the LAION's Den: Investigating Hate in Multimodal Datasets," NeurIPS D&B Track 2023

Troubling Trends in Dataset Scaling

Scale exacerbates hate content.

Birhane, Prabhu, Han and Boddeti, "On Hate Scaling Laws For Data-Swamps," arXiv:2306.13141
Birhane, Prabhu, Han, Boddeti, Luccioni, "Into the LAION's Den: Investigating Hate in Multimodal Datasets," NeurIPS D&B Track 2023

Measuring Bias in Models

Narrative of AI Training: "Moar data! Much wow!"

Birhane, Prabhu, Han and Boddeti, "On Hate Scaling Laws For Data-Swamps," arXiv:2306.13141
Birhane*, Dehdashtian*, Prabhu and Boddeti, "The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models," FAccT 2024

Evaluation on 14 CLIP Models

Birhane, Prabhu, Han and Boddeti, "On Hate Scaling Laws For Data-Swamps," arXiv:2306.13141
Birhane*, Dehdashtian*, Prabhu and Boddeti, "The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models," FAccT 2024

Chicago Face Dataset

human being
animal
gorilla
chimpanzee
orangutan
thief
criminal
suspicious person

Troubling Trends in Dataset Scaling

Scale exacerbates stereotypes.

Birhane, Prabhu, Han and Boddeti, "On Hate Scaling Laws For Data-Swamps," arXiv:2306.13141
Birhane*, Dehdashtian*, Prabhu and Boddeti, "The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models," FAccT 2024

Fairness: The Multi-Headed Hydra

Verma and Rubin, "Fairness Definitions Explained," International Workshop on Software Fairness, 2018

Fairness Definitions: Statistical Parity

$P(\hat{Y}=1|S=1) = P(\hat{Y}=1|S=0)$

Probability of correct prediction is the same across demographic groups.

$\hat{Y} \perp \!\!\! \perp S$

Fairness Definitions: Equalized Odds

$P(\hat{Y}=y|Y=y, S=1) = P(\hat{Y}=y|Y=y, S=0)$

True positive rate of predictions is the same across demographic groups.

$\hat{Y} \perp \!\!\! \perp S | Y$

Fairness Definitions: Equality of Opportunity

$P(\hat{Y}=1|Y=1, S=1) = P(\hat{Y}=1|Y=1, S=0)$

Among eligible candidates, probability of correct prediction is the same across demographic groups.

$\hat{Y} \perp \!\!\! \perp S | Y=1$

How Fair is Your ML Model?

Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

How to Estimate these Trade-Offs?

U-FaTE ( Utility-Fairness Trade-Off Estimator) U-FaTE

Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

Face Image Dataset

Liu, Luo, Wang and Tang "Deep Learning Face Attributes in the Wild," ICCV 2015

CelebA Faces

$Y$: high cheekbones (binary) and $S$: age and sex (continuous + binary)

Evaluation of over 1000 supervised image feature extractors.

Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

CelebA Faces

$Y$: high cheekbones (binary) and $S$: age and sex (continuous + binary)

Evaluation of over 100 zero-shot multimodal (CLIP) models.

Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

Face Image Dataset

Karkkainen and Joo "FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation," WACV 2021

FairFace Dataset

$Y$: sex (binary) and $S$: race (7 classes)

Evaluation of over 1000 supervised image feature extractors.

Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

FairFace Dataset

$Y$: sex (binary) and $S$: race (7 classes)

Evaluation of over 100 zero-shot multimodal (CLIP) models.

Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

Mitigating Bias in AI Systems

From Fair Learning to Fair Representation Learning

$Z \perp \!\!\! \perp S \Rightarrow \hat{Y} \perp \!\!\! \perp S$

Learning Fair Representations

Target Attribute: Smile & Demographic Attribute: Gender

Problem Definition:

Learn a representation $\mathbf{z} \in \mathbb{R}^d$ from data $\mathbf{x}$
Retain information necessary to predict target attribute $\mathbf{t}\in\mathcal{T}$
Remove information related to a desired demographic attribute $\mathbf{s}\in\mathcal{S}$

A Fork in the Road

Design metric to measure sensitive demographic attribute information

non-parameteric statistical dependence measures

Learn metric to measure semantic attribute information

probably feasible, many prior attempts

Adversarial Representation Learning

Game Theoretic Formulation

Three player game between:

Encoder extracts features $\mathbf{z}$
Target Predictor for desired task from features $\mathbf{z}$
Adversary extracts sensitive information from features $\mathbf{z}$

$$ \begin{equation} \begin{aligned} \min_{\mathbf{\Theta}_E,\mathbf{\Theta}_T} & \underbrace{\color{cyan}{J_t(\mathbf{\Theta}_E,\mathbf{\Theta}_T)}}_{\color{cyan}{\text{error of target}}} \quad s.t. \text{ } \min_{\mathbf{\Theta}_A} \underbrace{\color{orange}{J_s(\mathbf{\Theta}_E,\mathbf{\Theta}_A)}}_{\color{orange}{\text{error of adversary}}} \geq \alpha \nonumber \end{aligned} \end{equation} $$

Adversary: learned measure of semantic attribute information

How do we learn model parameters?

Simultaneous/Alternating Stochastic Gradient Descent

Update target while keeping encoder and adversary frozen.
Update adversary while keeping encoder and target frozen.
Update encoder while keeping target and adversary frozen.

Three Player Game: Linear Case

Global solution is $(w_1, w_2, w_3)=(0, 0, 0)$

What we get

What we want

P. Roy and V.N. Boddeti, "Mitigating Information Leakage in Image Representations: A Maximum Entropy Approach", CVPR 2019

Many Solutions for Bias Mitigation

Standard Adversarial Representation Learning
Linear Adversarial Measure: linear dependency between $Z$ and $S$ [ICCV 2019, CVPRW 2020]
Non-Linear Adversarial Measure: Beyond linear dependency between $Z$ and $S$, but not all types [ECML 2021]
Universal Dependence Measure: All types of dependency between $Z$ and $S$ [TMLR 2022]
End-to-End Universal Dependence Measure: All types of dependency between $Z$ and $S$ [CVPR 2024]

Face Image Dataset

Liu, Luo, Wang and Tang "Deep Learning Face Attributes in the Wild," ICCV 2015

CelebA Faces

$Y$: high cheekbones (binary) and $S$: age and sex (continuous + binary)

Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

Folktables

$Y$: employement status (binary) and $S$: age (continuous)

Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024