Towards a Science of Trustworthy AI: Fundamental Limits and How to Achieve Them


Stony Brook University

Slides: hal.cse.msu.edu/talks
VishnuBoddeti

Michigan State University


AI is becoming a critical infrastructure

Tesla FSD
Chat Bots
Rapid AI
Walker from UBTECH

The Story Benchmarks Tell Us

Progress is rapid, AGI is imminent.

State of Affairs

(report from the real-world)

Improving Accuracy, Same Old Reliability

High predictive capability, but are Biased

"Detroit changes rules for police use of facial recognition after wrongful arrest of Black man"


High predictive capability, but leak Private Information

High Quality T2I Models, but Same Old Stereotypes

High Quality T2I Models, but Same Old Stereotypes

Prompt: "a photo of a doctor"

Prompt: "a photo of an Indian doctor"

What % of generated images depict men? (FLUX.1)

Doctor Doctor Doctor Doctor Doctor Doctor Doctor Doctor Doctor Doctor Doctor Doctor
Indian Doctor Indian Doctor Indian Doctor Indian Doctor Indian Doctor Indian Doctor Indian Doctor Indian Doctor Indian Doctor Indian Doctor Indian Doctor Indian Doctor
78%
"a doctor"
98%
(+20)
"an Indian doctor"

Composing concepts amplifies stereotypes.

High Predictive Accuracy, but Not Robust

Can synthetic image detectors tell these images apart?

Unsteered generation
Detector correctly flags as synthetic

PolyJuice-steered generation
Detector fooled

High Predictive Accuracy, but Not Robust

Attack success rate against state-of-the-art synthetic image detectors

~55%
Without steering
(averaged across models)
~91%
+36
With PolyJuice steering

Simple latent-space steering defeats deployed detectors.

How Are These Challenges Addressed Today?


$$\min_\theta \; \mathcal{L}_{\text{task}}(\theta) \;+\; \lambda \cdot \mathcal{L}_{\text{constraint}}(\theta)$$

Pick a proxy loss. Pick a $\lambda$. Run gradient descent. Report a single number.

My Research Vision


My research answers these questions.

Characterize fundamental limits — prove the trade-off is inherent, not a limitation of current methods.

Map the frontier — measure where existing methods stand relative to what is achievable.

Close the gap — build systems that approach the limits through principled constraints and optimizers.

My Approach to Building Trustworthy AI Systems


A common methodology across,
fairness, privacy, robustness, and controllability:

Diagnose

Where do models
violate constraints?

Characterize

What are the fundamental limits any method must obey?

Achieve

Design systems that approach these limits

Selected Contributions

Fairness & Controllability

  • OASIS ICLR 2025 Spotlight
  • PolyJuice NeurIPS 2025
  • Obliviator NeurIPS 2025
  • CoInD ICLR 2025
  • DiverseFlow CVPR 2025
  • FairerCLIP ICLR 2024
  • Utility-Fairness Trade-Offs CVPR 2024
  • Dataset Scaling FAccT 2024
  • LAION's Den NeurIPS 2023
  • Invariant Representations TMLR 2022 Featured
  • Kernelized ARL ICCV 2019
  • Info Leakage CVPR 2019 Oral

Cryptographic Privacy

  • Book Chapter Springer 2026
  • CryptoFace CVPR 2025
  • SecureRAG NeurIPS WS 2025
  • HE Template Fusion TBIOM 2025
  • Shielding Face Repr. FG 2025
  • AutoFHE USENIX Security 2024
  • FHE Face Analytics FG 2024
  • FHE Score Fusion WIFS 2023
  • HEFT IJCB 2022 Best Paper
  • HERS TBIOM 2022 Best Paper
  • Secure Face Matching BTAS 2018
  • Privacy-Preserving VL ICCV 2017

Physics-Informed AI

  • Mechanics-Informed AE Nat. Comms 2024 Editor's Highlight
  • HADAR Nature 2023 Cover
  • Boundary Detection SMASIS 2022 Best Paper

Other Work

  • SEAL CVPR 2025 Oral
  • Gen. Zero-Shot CIR CVPR 2025
  • Symbolic Algorithms IROS 2023 Best Paper Finalist
  • Transmission-Friendly CNNs TMC 2023 Best Paper
  • Neural Arch. Transfer TPAMI 2021
  • NSGANetV2 ECCV 2020 Oral
  • NSGA-Net GECCO 2019 Best Paper

Today's Talk


  • Fairness & Controllability — Optimal trade-offs and concept erasure
  • Privacy — Encrypted inference via homomorphic encryption
  • Other Work — Physics-informed AI for engineering
  • Vision — Joint constraints and composed systems


Part 1: Fairness & Controllability

What are the optimal trade-offs for bias mitigation and concept erasure?

From Fair Learning to Fair Representation Learning

$Z \perp \!\!\! \perp S \Rightarrow \hat{Y} \perp \!\!\! \perp S$

Learn representation $\mathbf{z}$ that retains target information $Y$ while removing sensitive attribute $S$.

Statistical Dependence Formulation

  • Bi-Objective Optimization Problem:
    • Encoder extracts features $\mathbf{z}$
    • Statistical dependence between target task and features $\mathbf{z}$
    • Statistical dependence between sensitive attribute and features $\mathbf{z}$
    $$ \begin{equation} \begin{aligned} \max_{\mathbf{\Theta}_E} & \text{ } \underbrace{\color{cyan}{Dep(Z,Y)}}_{\color{cyan}{\text{target dependence}}} \quad s.t. \text{ } \underbrace{\color{orange}{Dep(Z,S)}}_{\color{orange}{\text{sensitive dependence}}} \leq \alpha \nonumber \end{aligned} \end{equation} $$

Making Bias Mitigation Near-Optimal

    • Standard Adversarial Representation Learning - $E(\hat{Y}) \perp \!\!\! \perp S$
    • Universal Dependence Measure: $Z \perp \!\!\! \perp S$ [TMLR 2022]
    • End-to-End Universal Dependence Measure: $Z \perp \!\!\! \perp S$ [CVPR 2024]

Characterize: What is the Optimal Trade-Off?

  • Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022 (Outstanding Certification Finalist)

How to Estimate These Trade-Offs?

U-FaTE (Utility-Fairness Trade-Off Estimator)
  • Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

Utility-Fairness Trade-Offs

Folktables
  • Sadeghi, Dehdashtian, Boddeti, "On Characterizing the Trade-off in Invariant Representation Learning," TMLR 2022
  • Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

Map: Where Do Existing Models Stand?

  • $Y$: high cheekbones and $S$: age and sex
Over 1,000 supervised models: most lie far from the optimal trade-off.
  • Sadeghi, Dehdashtian, Boddeti, TMLR 2022; Dehdashtian, Sadeghi, Boddeti, CVPR 2024

Foundation Models Are No Better

  • $Y$: high cheekbones and $S$: age and sex
Over 100 zero-shot CLIP models: same gap from optimal.
  • Dehdashtian, Sadeghi, Boddeti, "Utility-Fairness Trade-Offs and How to Find Them," CVPR 2024

Bias in CLIP's Zero-Shot Prediction

Error: Unable to load Plotly figure.
  • Dehdashtian*, Wang* and Boddeti, "FairerCLIP: Debiasing CLIP's Zero-Shot Predictions using Functions in RKHSs," (ICLR 2024)

Achieve: FairerCLIP - Debiasing Foundation Models

Error: Unable to load Plotly figure.

  • Dehdashtian*, Wang* and Boddeti, "FairerCLIP: Debiasing CLIP's Zero-Shot Predictions using Functions in RKHSs," (ICLR 2024)


From fairness to controllability

What is the cost of erasing a concept from a model?

Obliviator: The Cost of Nonlinear Guardedness in Concept Erasure

Formalize erasure as statistical independence: $\text{Dep}(Z_\theta, S) = 0 \iff Z_\theta \perp \!\!\! \perp S$
$$\inf_\theta\quad\underbrace{\textrm{Dep}(Z_\theta,S)}_{\text{Minimize Statistical Dependency}}-\underbrace{\textrm{Dep}(Z_\theta,Y)}_{\text{A Proxy to Preserve Utility Information}}$$

Obliviator: The Cost of Nonlinear Concept Erasure


Prior methods only achieve linear guardedness.
Obliviator achieves full nonlinear guardedness.

Single-Stage Optimization Fails; Iterative Erasure Traces the Frontier


Single-stage optimization yields scattered, suboptimal solutions.
Iterative erasure traces the full Pareto frontier between utility and erasure.

Obliviator Reveals: Nonlinear Erasure Has an Unavoidable Cost


Representation: Frozen
Two Pareto frontiers (supervised/unsupervised) mirror the DST/LST structure from fairness.
Erasure has an unavoidable cost, and access to task labels significantly shifts the achievable frontier.

Mitigating Stereotypes in Generative Models

DiverseFlow: Sample-efficient diverse mode coverage in flow models.


Part 2: Cryptographic Privacy

Can we run AI on data that remains encrypted throughout?

The blind spot of traditional encryption

Privacy of user data is not guaranteed.

FHE can help AI models achieve trustworthiness

FHE enables AI models to process encrypted data without decryption.

FHE Inverts the Cost Hierarchy of Computation



Standard Hardware (GPU)
  • All operations roughly equal cost
  • Nonlinearities are cheap
  • Go deep: more layers = more accuracy

Standard AI architectures are not compatible with FHE cost model.

how to Adapt Neural Networks for FHE

Polynomial approximation for non-linear activations

space of Homomorphic neural Architectures

How to effectively trade-off between accuracy and latency?

AutoFHE: Network-Level Co-Design for Encrypted Inference


Key insight: Approximate the end-to-end function, not individual activations. via per-layer polynomials.

Characterize: Accuracy-Latency Pareto Frontier Under FHE

Achieve: CryptoFace - FHE-Native Deep Networks


AutoFHE adapts existing architectures.

Can we do better by designing architectures natively for encryption?


  • Shallow, parallel patch-based design
  • Minimizes multiplicative depth
  • Exploits parallelism across ciphertext slots
  • Near-constant latency across resolutions

CryptoFace: Comparable Accuracy, 8x Lower Latency

Approach Resolution Network Bootstraps Avg Accuracy Latency (s)
MPCNN 64x64 ResNet44 43 89.64 1,640
AutoFHE 64x64 ResNet32 8 82.69 667
CryptoFace 64x64 CryptoFaceNet4 2 89.42 220
CryptoFace 128x128 CryptoFaceNet16 2 91.46 241
7.5x speedup (27 min → 3.6 min), preserving accuracy.
Near-constant latency across resolutions (64x64 → 128x128).

Selected for the homomorphicencryption.org benchmark suite (Google, Amazon, Intel).

Beyond Classification: Encrypted Retrieval-Augmented Generation


  • FHE-based encrypted search over knowledge bases
  • Attribute-based encryption for access control
Encrypted AI scales beyond classification to complex LLM pipelines.
Amina Bassit and Vishnu Boddeti, SecureRAG, NeurIPS GenAI4Health Workshop 2025


Other Work: Physics-Informed AI

Can physical laws serve as constraints that keep AI grounded in reality?

"Ghosting Effect" in Thermal Vision




Why are thermal images blurry?



Ghosting effect: when radiated signal is stronger than reflected ambient signal.

TeX Decomposition of Thermal Signals

TeX Decomposition: Solve inverse problem with physics constraints.

Thermal vs TeX Decomposition

Heat Assisted Ranging

$TeX_{night} \approx RGB_{day} > IR_{night}$
  • HADAR thermography reaches the Cramer-Rao bound on temperature accuracy, beating commercial thermography.
Bao, ..., Boddeti, Jacob. Heat-Assisted Detection and Ranging. Nature 619, 2023 (Cover Article).

Mechanics-Informed Structural Health Monitoring

Mechanics constraints lead to 35% better zero-shot damage detection.

Estimating Non-Linear Parameter Fields in Multi-Physics Problems

Solving inverse problems with the adjoint method.


Vision: Future Directions

What happens when multiple trust constraints must hold simultaneously across composed systems?

The Next Frontier: Scaling the Science of Trust

Trust in Composed and Agentic AI Systems

$$f = f_{lm}(f_{vis} \circ f_{proj}, f_{text})$$
  • Do guarantees for individual components survive composition?

  • Can failures that emerge only from interaction be diagnosed systematically?

  • Can trust be auditable by construction?

Joint Limits: When Constraints Must Coexist

NSF SCH: Fundamental Limits of Fair and Privacy-Preserving Healthcare Models (2025-2029)

Concluding Remarks

Towards a Science of Trustworthy AI


AI is becoming critical infrastructure.

Every other form of infrastructure has a science of its limits.

AI does not.

My research builds towards this science.

Towards a Science of Trustworthy AI