Measuring and Mitigating Bias in AI

Ethical AI and High-Performance Computing Seminar

Slides: hal.cse.msu.edu/talks VishnuBoddeti

Progress In Artificial Intelligence

Speech Processing

Image Analysis

Natural Language Processing

Physical Sciences

Key Drivers Data, Compute, Algorithms

State-of-Affairs

(report from the real-world)

"Facial recognition is accurate, if you're a white guy"

Feb. 09, 2018

lighter faces: 0.7% error

darker faces: 12.9% error

Boulamwini and Gebru, "Gender Shades:Intersectional Accuracy Disparities in Commercial Gender Classification," FAT 2018

Facial recognition bias frustrates Black asylum applicants to US, advocates say

February 08, 2023

"Detroit changes rules for police use of facial recognition after wrongful arrest of Black man"

July 01, 2024

"Black Artists Say A.I. Shows Bias"

July 04, 2023

"How AI reduces the world to stereotypes"

October 10, 2023

"Google chief admits ‘biased’ AI tool’s photo diversity offended users"

February 28, 2024

"ChatGPT leans liberal, research shows"

August 16, 2023

"LLMs propagate race-based medicine"

October 20, 2023

Real world machine learning systems are effective but,

are biased,

violate user's privacy and

not trustworthy.

Research Questions

Measure bias in AI models.

Mitigate bias in AI models.

Measuring Bias in AI

Measuring Bias in Datasets

How about Data?

DataComp: In search of the next generation of multimodal datasets, NeurIPS D&B 2023

Measuring Hate Content in Text

Birhane, Prabhu, Han and Boddeti, "On Hate Scaling Laws For Data-Swamps," arXiv:2306.13141
Birhane, Prabhu, Han, Boddeti, Luccioni, "Into the LAION's Den: Investigating Hate in Multimodal Datasets," NeurIPS D&B Track 2023

Measuring Hate Content in Text

Birhane, Prabhu, Han and Boddeti, "On Hate Scaling Laws For Data-Swamps," arXiv:2306.13141
Birhane, Prabhu, Han, Boddeti, Luccioni, "Into the LAION's Den: Investigating Hate in Multimodal Datasets," NeurIPS D&B Track 2023

Troubling Trends in Dataset Scaling

Scale exacerbates hate content.

Birhane, Prabhu, Han and Boddeti, "On Hate Scaling Laws For Data-Swamps," arXiv:2306.13141
Birhane, Prabhu, Han, Boddeti, Luccioni, "Into the LAION's Den: Investigating Hate in Multimodal Datasets," NeurIPS D&B Track 2023

Measuring Bias in Discriminative Models

Narrative of AI Training: "Moar data! Much wow!"

Birhane, Prabhu, Han and Boddeti, "On Hate Scaling Laws For Data-Swamps," arXiv:2306.13141
Birhane*, Dehdashtian*, Prabhu and Boddeti, "The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models," FAccT 2024

Evaluation on 14 CLIP Models

Birhane, Prabhu, Han and Boddeti, "On Hate Scaling Laws For Data-Swamps," arXiv:2306.13141
Birhane*, Dehdashtian*, Prabhu and Boddeti, "The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models," FAccT 2024

Chicago Face Dataset

human being
animal
gorilla
chimpanzee
orangutan
thief
criminal
suspicious person

Troubling Trends in Dataset Scaling

Scale exacerbates stereotypes.

Birhane, Prabhu, Han and Boddeti, "On Hate Scaling Laws For Data-Swamps," arXiv:2306.13141
Birhane*, Dehdashtian*, Prabhu and Boddeti, "The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models," FAccT 2024

Fairness: The Multi-Headed Hydra

Verma and Rubin, "Fairness Definitions Explained," International Workshop on Software Fairness, 2018

How Fair is Your ML Model?

Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

How to Estimate these Trade-Offs?

U-FaTE ( Utility-Fairness Trade-Off Estimator) U-FaTE

Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

Face Image Dataset

Liu, Luo, Wang and Tang "Deep Learning Face Attributes in the Wild," ICCV 2015

CelebA Faces

$Y$: high cheekbones (binary) and $S$: age and sex (continuous + binary)

Evaluation of over 1000 supervised image feature extractors.

Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

CelebA Faces

$Y$: high cheekbones (binary) and $S$: age and sex (continuous + binary)

Evaluation of over 100 zero-shot multimodal (CLIP) models.

Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

Measuring Bias in Generative Models

High-Quality T2I Models, Same Old Stereotypes

Dehdashtian, Sreekumar and Boddeti, "OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes", ICLR 2025

OASIS: Toolbox for Measuring and Understanding Stereotypes

Dehdashtian, Sreekumar and Boddeti, "OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes", ICLR 2025

Lower, Yet Significant Stereotypes in Newer T2I Models

Dehdashtian, Sreekumar and Boddeti, "OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes", ICLR 2025

Nationality Worsens Existing Gender Stereotypes about Professions

Stereotypes worsen with compositional concepts.

Dehdashtian, Sreekumar and Boddeti, "OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes", ICLR 2025

T2I Models Have Stereotypical Predispositions about Nationalities

Indian

Mexican

Dehdashtian, Sreekumar and Boddeti, "OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes", ICLR 2025

Mitigating Bias in AI Systems

Mitigating Bias in Discriminative Models

From Fair Learning to Fair Representation Learning

$Z \perp \!\!\! \perp S \Rightarrow \hat{Y} \perp \!\!\! \perp S$

Learning Fair Representations

Target Attribute: Smile & Demographic Attribute: Gender

Problem Definition:

Learn a representation $\mathbf{z} \in \mathbb{R}^d$ from data $\mathbf{x}$
Retain information necessary to predict target attribute $\mathbf{t}\in\mathcal{T}$
Remove information related to a desired demographic attribute $\mathbf{s}\in\mathcal{S}$

A Fork in the Road

Design metric to measure sensitive demographic attribute information

non-parameteric statistical dependence measures

Learn metric to measure semantic attribute information

probably feasible, many prior attempts

Adversarial Representation Learning

Game Theoretic Formulation

Three player game between:

Encoder extracts features $\mathbf{z}$
Target Predictor for desired task from features $\mathbf{z}$
Adversary extracts sensitive information from features $\mathbf{z}$

$$ \begin{equation} \begin{aligned} \min_{\mathbf{\Theta}_E,\mathbf{\Theta}_T} & \underbrace{\color{cyan}{J_t(\mathbf{\Theta}_E,\mathbf{\Theta}_T)}}_{\color{cyan}{\text{error of target}}} \quad s.t. \text{ } \min_{\mathbf{\Theta}_A} \underbrace{\color{orange}{J_s(\mathbf{\Theta}_E,\mathbf{\Theta}_A)}}_{\color{orange}{\text{error of adversary}}} \geq \alpha \nonumber \end{aligned} \end{equation} $$

Adversary: learned measure of semantic attribute information

How do we learn model parameters?

Simultaneous/Alternating Stochastic Gradient Descent

Update target while keeping encoder and adversary frozen.
Update adversary while keeping encoder and target frozen.
Update encoder while keeping target and adversary frozen.

Three Player Game: Linear Case

Global solution is $(w_1, w_2, w_3)=(0, 0, 0)$

What we get

What we want

P. Roy and V.N. Boddeti, "Mitigating Information Leakage in Image Representations: A Maximum Entropy Approach", CVPR 2019

Many Solutions for Bias Mitigation

Standard Adversarial Representation Learning
Linear Adversarial Measure: linear dependency between $Z$ and $S$ [ICCV 2019, CVPRW 2020]
Non-Linear Adversarial Measure: Beyond linear dependency between $Z$ and $S$, but not all types [ECML 2021]
Universal Dependence Measure: All types of dependency between $Z$ and $S$ [TMLR 2022]
End-to-End Universal Dependence Measure: All types of dependency between $Z$ and $S$ [CVPR 2024]

Face Image Dataset

Liu, Luo, Wang and Tang "Deep Learning Face Attributes in the Wild," ICCV 2015

CelebA Faces

$Y$: high cheekbones (binary) and $S$: age and sex (continuous + binary)

Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

Folktables

$Y$: employement status (binary) and $S$: age (continuous)

Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024