Fairness and Privacy in Artificial Intelligence
RAISE
Vishnu Boddeti
March 24, 2021
VishnuBoddeti
Progress In Artificial Intelligence
Speech Processing
Image Analysis
Natural Language Processing
Robotics
Key Driver
Data, Compute, Algorithms
State-of-Affairs
(report from the real-world)
"Tay, Microsoft's AI chatbot, gets a crash course in racism from Twitter"
March 24, 2016
"FaceApp's creator apologizes for the app's skin-lightening 'hot' filter"
April 25, 2017
"Facial recognition is accurate, if you're a white guy"
Feb. 09, 2018
lighter faces: 0.7% error
darker faces: 12.9% error
- Boulamwini and Gebru, "Gender Shades:Intersectional Accuracy Disparities in Commercial Gender Classification," FAT 2018
"The Secretive Company That Might End Privacy as We Know It"
Jan. 18, 2020
Real world artificial intelligence systems are effective but,
are biased,
violate user’s privacy and
not trustworthy.
Today's Agenda
Build effective AI systems that are fair and trustworthy.
Fair and Trustworthy AI
Mechanism: control semantic information in data representations
100 Years of Data Representations
Control Mathematical Concepts
variance, sparsity, translation, rotation, scale, etc.
Bias in Learning
- Boulamwini and Gebru, "Gender Shades:Intersectional Accuracy Disparities in Commercial Gender Classification," FAT 2018
Privacy Leakage
- Training:
- Inference: Microsoft Smile classification
- Target Task
- Privacy Leakage
- B. Sadeghi, L. Wang, V.N. Boddeti, "Adversarial Representation Learning With Closed-Form Solvers," CVPRW 2020
Dark Secret of Deep Learning
Recklessly absorb all statistical correlations in data
So What?
Demographic Bias
light: 0.7% error & 12.9% error
Overfitting to Domain
classification/regression & domain
Privacy Leakage
smile: 93.1% & gender: 82.9%
Lack of Robustness
classification & degradation
Next Era of Data Representations
Control Semantic Concepts
age, gender, domain, etc.
Controlling Semantic Information
- Target Concept: Smile & Private Concept: Gender
- Problem Definition:
- Learn a representation $\mathbf{z} \in \mathbb{R}^d$ from data $\mathbf{x}$
- Retain information necessary to predict target attribute $\mathbf{t}\in\mathcal{T}$
- Remove information related to a desired sensitive attribute $\mathbf{s}\in\mathcal{S}$
Technical Challenge
- How to explicitly control semantic information in learned representations?
- Can we explicitly control semantic information in learned representations?
The Can
Short Answer: Yes, we can, sometimes.
A Subspace Geometry Perspective
- Case 1: when $\mathcal{S} \perp \!\!\! \perp \mathcal{T}$ (Gender, Age)
- Case 3: when $\mathcal{S} \sim \mathcal{T}$ ($\mathcal{T}\subseteq\mathcal{S}$)
- Case 2: when $\mathcal{S} \not\perp \!\!\! \perp \mathcal{T}$ (Car, Wheels)
- B. Sadeghi, L. Wang, V.N. Boddeti, ‘‘Adversarial Representation Learning with Closed-Form Solutions," CVPRW 2020
The How
Short Answer: It depends.
Application-1: Fair Classification
- UCI Adult Dataset (creditworthiness, gender)
Method |
Income |
Gender |
$\Delta^*$ |
Raw Data |
84.3 |
98.2 |
22.8 |
Remove Gender |
84.2 |
83.6 |
16.1 |
Our Approach |
84.1 |
67.4 |
0.0 |
$^*$ Absolute difference between adversary accuracy and random chance
Fair Classification: Interpreting Encoder Weights
Embedding Weights (Adult Dataset)
Application-2: Mitigating Privacy Leakage
- CelebA Dataset (smile, gender)
Method |
Smile |
Gender |
$\Delta^*$ |
Raw Data |
93.1 |
82.9 |
21.5 |
Zero-Sum game |
91.8 |
72.5 |
11.1 |
Our Approach |
92.5 |
61.4 |
0.0 |
$^*$ Absolute difference between adversary accuracy and random chance
Application-3: Mitigating Privacy Leakage
Application-4: Illumination Invariance
- 38 identities and 5 illumination directions
- Target: Identity Label
- Sensitive: Illumination Label
Method |
$s$ (lighting) |
$t$ (identity) |
Raw Data |
96 |
78 |
NN + MMD (NeurIPS 2014) |
- |
82 |
VFAE (ICLR 2016) |
57 |
85 |
Zero-Sum Game (NeurIPS 2017) |
57 |
89 |
Our Approach |
20 |
86 |
Privacy Preserving AI
Mechanism: control access to information in data representations
Privacy Leakage in Augmented Reality
- Pittaluga et. al., "Revealing Scenes by Inverting Structure from Motion Reconstructions", CVPR 2019
Information Leakage from Gradients
- Yonetani et. al., "Privacy-Preserving Visual Learning Using Doubly Permuted Homomorphic Encryption", ICCV 2017
Learning from Private Data: Federated Learning
- Distributed learning of parameters from private data.
- Clients download current global model $\bar{\mathbf{w}_t}$.
- Client updates model from local data.
- Aggregator updates global model
- Yonetani et. al., "Privacy-Preserving Visual Learning Using Doubly Permuted Homomorphic Encryption", ICCV 2017
So What?
$\dots$ consent should be given for all purposes $\dots$
Encryption: The Holy Grail?
- Data encryption is an attractive option for protection.
- protects user's privacy
- enables free and open sharing
- mitigate legal and ethical issues
Goal
efficient learning directly from encrypted data
efficient inference directly on encrypted data
Learning from Private Data
Homomorphic Encryption for Learning Sparse Models
Facial Attribute Recognition
Methods |
Accuracy |
Privacy |
LLWT15 |
87 |
No |
DP |
78 |
Yes |
DP+SGD |
64 |
Yes |
Our Approach |
84 |
Yes |
Sensitive Place Recognition
Method |
Average Precision |
Privacy |
DP |
0.546 |
Yes |
DP+SGD |
0.704 |
Yes |
Our Approach |
0.729 |
Yes |
- Yonetani et. al., "Privacy-Preserving Visual Learning Using Doubly Permuted Homomorphic Encryption", ICCV 2017
Inference on Private Data
Amortized Homomorphically Encrypted Inner Products
- V.N. Boddeti, "Secure Face Matching Using Fully Homomorphic Encryption,", BTAS 2018
Open Problems
- How do we mitigate bias in AI?
- How do we ensure AI systems do not violate user privacy?
- Understand fundamental trade-off between utility and fairness.
- Understand fundamental trade-off between utility and privacy.
- $\dots$
Summary
- Today's AI systems are biased and violate user's privacy.
- How do we make AI systems fair and privacy-preserving?
- Many unanswered open questions and practical challenges.
Human Analysis Lab
VishnuBoddeti