Towards Fairness in Biometric Systems: Fundamental Trade-Offs and Algorithms
German Biometrics Working Group Meeting
Vishnu Boddeti
September 14, 2020
VishnuBoddeti
Progress In Biometrics
Key Driver
Data, Compute, Algorithms
State-of-Affairs
(report from the real-world)
"Tay, Microsoft's AI chatbot, gets a crash course in racism from Twitter"
March 24, 2016
"FaceApp's creator apologizes for the app's skin-lightening 'hot' filter"
April 25, 2017
"Facial recognition is accurate, if you're a white guy"
Feb. 09, 2018
lighter faces: 0.7% error
darker faces: 12.9% error
- Boulamwini and Gebru, "Gender Shades:Intersectional Accuracy Disparities in Commercial Gender Classification," FAT 2018
"The Secretive Company That Might End Privacy as We Know It"
Jan. 18, 2020
Real world biometric recognition systems are effective but,
are biased,
violate user’s privacy and
not trustworthy.
Today's Agenda
Build biometric systems that are fair and trustworthy.
Fair and Trustworthy ML
Mechanism: control semantic information in data representations
100 Years of Data Representations
Control Mathematical Concepts
variance, sparsity, translation, rotation, scale, etc.
Bias in Learning
- Boulamwini and Gebru, "Gender Shades:Intersectional Accuracy Disparities in Commercial Gender Classification," FAT 2018
Privacy Leakage
- Training:
- Inference: Microsoft Smile classification
- Target Task
- Privacy Leakage
- B. Sadeghi, L. Wang, V.N. Boddeti, "Adversarial Representation Learning With Closed-Form Solvers," CVPRW 2020
Information Leakage from Representations
- Learned Embeddings:
- Attacks on Embeddings:
Face reconstruction from template
Privacy leakage through attribute prediction from template
- Mai et. al., ‘‘On the reconstruction of face images from deep face templates," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018
Dark Secret of Deep Learning
Recklessly absorb all statistical correlations in data
Next Era of Biometric Representations
Control Semantic Concepts
age, gender, domain, etc.
Controlling Semantic Information
- Target Concept: Smile & Private Concept: Gender
- Problem Definition:
- Learn a representation $\mathbf{z} \in \mathbb{R}^d$ from data $\mathbf{x}$
- Retain information necessary to predict target attribute $\mathbf{t}\in\mathcal{T}$
- Remove information related to a desired sensitive attribute $\mathbf{s}\in\mathcal{S}$
Technical Challenge
- How to explicitly control semantic information in learned representations?
- Can we explicitly control semantic information in learned representations?
The Can
Short Answer: Yes, we can, sometimes.
A Subspace Geometry Perspective
- Case 1: when $\mathcal{S} \perp \!\!\! \perp \mathcal{T}$ (Gender, Age)
- Case 3: when $\mathcal{S} \sim \mathcal{T}$ ($\mathcal{T}\subseteq\mathcal{S}$)
- Case 2: when $\mathcal{S} \not\perp \!\!\! \perp \mathcal{T}$ (Car, Wheels)
- B. Sadeghi, L. Wang, V.N. Boddeti, ‘‘Adversarial Representation Learning with Closed-Form Solutions," CVPRW 2020
The How
Short Answer: It depends.
A Fork in the Road
- Design metric to measure semantic attribute information
- Learn metric to measure semantic attribute information
Adversarial Representation Learning
Game Theoretic Formulation
- Three player game between:
- Encoder extracts features $\mathbf{z}$
- Target Predictor for desired task from features $\mathbf{z}$
- Adversary extracts sensitive information from features $\mathbf{z}$
\begin{equation}
\begin{aligned}
\min_{\mathbf{\Theta}_E,\mathbf{\Theta}_T} & \underbrace{\color{cyan}{J_t(\mathbf{\Theta}_E,\mathbf{\Theta}_T)}}_{\color{cyan}{\mbox{error of target}}} \quad s.t. \mbox{ } \min_{\mathbf{\Theta}_A} \underbrace{\color{orange}{J_s(\mathbf{\Theta}_E,\mathbf{\Theta}_A)}}_{\color{orange}{\mbox{error of adversary}}} \geq \alpha \nonumber
\end{aligned}
\end{equation}
- Adversary: learned measure of semantic attribute information
How do we learn model parameters?
- Simultaneous/Alternating Stochastic Gradient Descent
- Update target while keeping encoder and adversary frozen.
- Update adversary while keeping encoder and target frozen.
- Update encoder while keeping target and adversary frozen.
Three Player Game: Linear Case
- Global solution is $(w_1, w_2, w_3)=(0, 0, 0)$
What we get
What we want
Our Contributions
- Non-Zero Sum Formulation for Iterative Methods (CVPR'19)
- Standard setting, each player is a deep neural network.
- Local optima
- Global Optima for Kernel Methods (ICCV'19)
- Simplified setting, each player is linear.
- closed form solution + stable + performance bounds
- Hybrid Model with CNNs and Closed-Form Solvers (CVPRW'20)
- Standard setting, encoder is a deep neural network, other players are closed-form solvers.
- Local optima
Optimizing Likelihood Can be Sub-Optimal
Adversary
Encoder
Equilibrium
- Limitations:
- Encoder target distribution leaks information !!
- Practice: simultaneous SGD does not reach equilibrium
- Class Imbalance: likelihood biases solution to majority class
Maximum Entropy ARL Continued...
- Three player game between:
- Encoder extracts features $\mathbf{z}$
- Target Predictor for desired task from features $\mathbf{z}$
- Adversary extracts sensitive information from features $\mathbf{z}$
- Three Player Non-Zero Sum Game:
\begin{equation}
\begin{aligned}
\min_{\mathbf{\theta}_A} & \mbox{ } \underbrace{\color{orange}{J_1(\mathbf{\theta}_E,\mathbf{\theta}_A)}}_{\color{orange}{\mbox{error of adversary}}} \\
\min_{\mathbf{\theta}_E,\mathbf{\theta}_T} & \mbox{ } \underbrace{\color{cyan}{J_2(\mathbf{\theta}_E,\mathbf{\theta}_T)}}_{\color{cyan}{\mbox{error of target}}} - \alpha \underbrace{\color{orange}{J_3(\mathbf{\theta}_E,\mathbf{\theta}_A)}}_{\color{orange}{\mbox{entropy of adversary}}} \nonumber
\end{aligned}
\end{equation}
Geometry of Optimization
\begin{equation}
\begin{aligned}
\min_{\mathbf{\Theta}_E} & \ \ {\color{cyan}{J_t(\mathbf{\Theta}_E)}} \\
\mathrm {s.t. \ \ } & {\color{orange}{J_s (\mathbf{\Theta}_E) \ge \alpha}} \nonumber
\end{aligned}
\end{equation}
- Non-convexity: feasible set is non-convex
- Non-differentiability: solution is either a plane or a line
B. Sadeghi, R. Yu, V.N. Boddeti, ‘‘On the Global Optima of Kernelized Adversarial Representation Learning," ICCV 2019
Solution: Spectral Adversarial Representation Learning
- Lagrangian formulation:
\begin{equation}
\min_{\mathbf{\Theta}_E} \Big\{(1-\lambda){\color{cyan}{J_t(\mathbf{\Theta}_E)}}- (\lambda) {\color{orange}{J_s (\mathbf{\Theta}_E)} }\Big\} \nonumber
\end{equation}
Non-Convex + Non-Differentiable
- Solution:
\begin{equation}
\mathbf{\Theta}_E, r^*=\mbox{Negative Eig} \Big\{\mathbf{X}\left(\lambda \color{orange}{\mathbf{S}^T \mathbf{S}} - (1-\lambda)\color{cyan}{\mathbf{Y}^T \mathbf{Y}} \right)\mathbf{X}^T \Big\}\nonumber
\end{equation}
Global Optima + Optimal Dimensionality + Performance Bounds
B. Sadeghi, R. Yu, V.N. Boddeti, "On the Global Optima of Kernelized Adversarial Representation Learning," ICCV 2019
Closed-Form Solvers
- Encoder extracts features $\mathbf{z}$
- Target Predictor: kernel ridge regressor to predict target from $\mathbf{z}$
- Adversary: kernel ridge regressor to extract sensitive information from $\mathbf{z}$
B. Sadeghi, L. Wang, V.N. Boddeti, "Adversarial Representation Learning with Closed-Form Solutions," CVPRW 2020
Properties of Ideal Embedding
- Embedding Dimensionality
- # of negative eigenvalues of
\begin{equation}
\mathbf{B} = \lambda \tilde{\mathbf{S}}^T \tilde{\mathbf{S}} -(1-\lambda)\tilde{\mathbf{Y}}^T \tilde{\mathbf{Y}}
\end{equation}
Bounds on Trade-Off
Application-1: Fair Classification
- UCI Adult Dataset (creditworthiness, gender)
Method |
Income |
Gender |
$\Delta^*$ |
Raw Data |
84.3 |
98.2 |
22.8 |
Remove Gender |
84.2 |
83.6 |
16.1 |
Zero-Sum game |
84.4 |
67.7 |
0.3 |
Non-Zero-Sum Game |
84.6 |
67.3 |
0.1 |
Global-Optima |
84.1 |
67.4 |
0.0 |
Hybrid |
83.8 |
67.4 |
0.0 |
$^*$ Absolute difference between adversary accuracy and random chance
Fair Classification: Interpreting Encoder Weights
Embedding Weights (Adult Dataset)
Application-2: Mitigating Privacy Leakage
- CelebA Dataset (smile, gender)
Method |
Smile |
Gender |
$\Delta^*$ |
Raw Data |
93.1 |
82.9 |
21.5 |
Zero-Sum game |
91.8 |
72.5 |
11.1 |
Non-Zero-Sum Game |
91.6 |
62.1 |
0.7 |
Global-Optima |
92.0 |
61.4 |
0.0 |
Hybrid |
92.5 |
61.4 |
0.0 |
$^*$ Absolute difference between adversary accuracy and random chance
Application-3: Mitigating Privacy Leakage
Application-4: Illumination Invariance
- 38 identities and 5 illumination directions
- Target:Identity Label
- Sensitive:Illumination Label
Method |
$s$ (lighting) |
$t$ (identity) |
Raw Data |
96 |
78 |
NN + MMD (NeurIPS 2014) |
- |
82 |
VFAE (ICLR 2016) |
57 |
85 |
Zero-Sum Game (NeurIPS 2017) |
57 |
89 |
Non-Zero-Sum Game |
40 |
89 |
Global-Optima |
20 |
86 |
Open Questions
- Understand fundamental trade-off between utility and fairness.
- Understand achievable trade-off between utility and fairness.
- Optimization of adversarial training, especially three player games under general settings.
- $\dots$
Summary
- Striving step towards explicit control of,
- semantic information in learned representations
- access to information in learned representations
- Many unanswered open questions and practical challenges.
Human Analysis Lab
VishnuBoddeti