The Utility-Fairness Trade-Offs in Learning Fair Representations
Wayne State University (CSE Graduate Seminar)
Speech Processing
Image Analysis
Natural Language Processing
Physical Sciences
Build ML systems that are fair while retaining utility.
Probability of correct prediction is the same across demographic groups.
True positive rate of predictions is the same across demographic groups.
Among eligible candidates, probability of correct prediction is the same across demographic groups.
$Z \perp \!\!\! \perp S \Rightarrow \hat{Y} \perp \!\!\! \perp S$
Unstable Optimization
Lack of Invariance Guarantees
Linear Dependence: $\displaystyle C_{SZ}\approx\displaystyle \frac{1}{n}{\color{Maroon}\tilde{\bm S}} \tilde{\bm Z}^T$
Universal Dependence: $\displaystyle \Sigma_{SZ}\approx\frac{1}{n}{\color{Maroon}\tilde{\bm K}_S} \tilde{\bm K}_Z$
$\mathcal H_Z\times\mathcal H_S\rightarrow \mathbb R:\ Cov\left(\alpha(Z),\ \beta({\color{Maroon}S})\right)$
$=\big\langle \beta, \Sigma_{SZ}\, \alpha \big\rangle_{\mathcal H_S}, \Sigma_{SZ}:\mathcal H_Z\rightarrow \mathcal H_S$
$Z \perp \!\!\! \perp {\color{Maroon}S} \Leftrightarrow \Sigma_{SZ} =0 \Leftrightarrow \left\|\Sigma_{SZ}\right\| =0$, where $\|\cdot\|$ can be any operator norm
HSIC$(Z,S)=\left\|\Sigma_{SZ}\right\|^2_{\text{HS}}=\displaystyle\sum_{\alpha\in\mathcal U_Z}\sum_{\beta\in \mathcal H_S}Cov^2(\alpha(Z),\beta(S))$
$\mathcal A_r:=\Big\{(f_1,\cdots,f_r)\,|\,Cov(f_i(X), f_j(X))+\gamma \langle f_i, f_j\rangle_{\mathcal H_X}=\delta_{i,j} \Big\}$
$\displaystyle\sup_{\bm f\in\mathcal A_r} {\Big\{J(\bm f):=\color{ForestGreen}(1-\lambda)\,\text{Dep}(Z, Y)}{\color{Maroon}-\lambda\,\text{Dep}(Z, S) }\Big\}$
Solution: eigenfunctions corresponding to the $r$ largest eigenvalues of $\big({\color{ForestGreen}(1-\lambda)\, \Sigma_{YX}^*\,\Sigma_{YX}}{\color{Maroon}-\lambda\,\Sigma_{SX}^*\Sigma_{SX} }\big)\bm f = \tau (\Sigma_{XX}+\gamma I)\bm f$
Scale exacerbates hate content.
Scale exacerbates stereotypes.