Co-Designing AI and Homomorphic Architectures for Secure AI


Pixel Biometrics Seminar

Slides: hal.cse.msu.edu/talks
VishnuBoddeti

Michigan State University


Progress In Artificial Intelligence

Speech Processing
Image Analysis
Natural Language Processing
Physical Sciences



Key Drivers
Data, Compute, Algorithms

Widespread deployment in the real-world, especially as cloud services.

Today: Data is Encrypted Only During Communication

Privacy of user data is not guaranteed.

Privacy and Security Requirements in AI

...consent should be given for all purposes...

What are we trying to secure?

  • Secure Data
    • Both, input $x$ and outcome $f(x)$
    • All intermediate computations are encrypted.
    • Preserves user privacy.

Example: Healthcare AI-based SaaS



However, both the data $x$ and the circuit $f$ are sensitive.

Securing User Data



Privacy of Alice's data is cryptographically guaranteed, but Bob has to give away his circuit to CCS.

Consequences of circuit cleartext FHE evaluation

  • Circuits are usually
    • Proprietary functions (e.g., ML trained models)
    • Directly derived from personal data (e.g., health data)
  • Circuits can be leaked from
    • Noise term analysis
    • Timing attacks
Circuit protection is also as crucial as data protection.

Privacy protection of both data and circuit

Today's Agenda



AI + Encryption
Is there an encryption scheme that satisfies our security requirements?

Fully Homomorphic Encryption

A Primer

What is Fully Homomorphic Encryption?

Run programs on encrypted data without ever decrypting it.
FHE can—in theory—handle universal computation.















Apple: Secure Caller ID and Secure Photo Search
Microsoft: Secure Password Search in Edge Browser

Private Information Retreival


  • Computational complexity $\mathcal{O}\left( K \cdot \left( \#\mathrm{M}_{\mathrm{HE}} + \#\mathrm{R}_{\mathrm{HE}} + \#\mathrm{A}_{\mathrm{HE}} \right) \right)$

Tutorial at biometric-privacy-security.github.io

Homomorphic Evaluation of Encrypted Data





    • Packing
    • Encoding

Primitive Operations Supported by Arithmetic FHE Schemes



Main Idea: Noisy Inner Products

Private Key

$\mathbf{s} = \begin{bmatrix}10 \\ 82 \\ 50 \\ 51\end{bmatrix}$
Adversary's Goal: Solve for private key from public key.
Domain Noise Problem Solution
$\mathbb{R}$ $\times$ System of Linear Equations Gaussian Elimination
$\mathbb{R}$ $\checkmark$ Least Squares Problem Least Squares Estimator
$\mathbb{Z}_q$ $\checkmark$ Learning with Errors Problem Infeasible in Polynomial Time

Encryption and Decryption

  • a: fixed random vector
  • e: noise from discrete Gaussian
Relies on hardness of the Learning with Errors problem.

Data Encoding

Operating with cyclotomic polynomials enables operations on vectors.
\[ \begin{align} \text{Secret Key: } s &= s(x) \text{ sampled with coefficients from } \{-1, 0, 1\} \nonumber \\ \text{Public Key: } p &= (a(x), b(x)) \nonumber \\ \text{with } a(x) &= \text{ random polynomial sampled from } \mathbb{Z}_q[x]/(x^N+1) \nonumber \\ \text{with } b(x) &= (\langle a(x), s(x) \rangle + e) \text{ mod } \mathbb{Z}_q[x]/(x^N+1) \nonumber \end{align} \]
  • message: $m(x)$ polynomial in $\mathbb{Z}_q[x]/(x^N+1)$
  • Encryption: $c(x) = b(x) + \Delta m(x) \text{ mod } \mathbb{Z}_q[x]/(x^N+1)$
  • Decryption: $\tilde{m}(x) = (c(x) - \langle a(x), s(x) \rangle)/\Delta$

Data Packing

Data Privacy
Data and Circuit Privacy

AutoFHE

Efficient CNN Evaluation over FHE

Co-Designing AI and Homomorphic Architectures

    • Security Requirement

Encryption Parameters
  • Cyclotomic polynomial degree: $N$
  • Level: $L$
  • Modulus: $Q_l=\prod_{i=0}^{l} q_l, 0 \leq q_l \leq L$
  • Bootstrapping Depth: $K$
  • Hamming Weight: $h$
    • Latency



CNNs under Homomorphic Encryption

Deep CNNs under Homomorphic Encryption

  • Level: number of allowed multiplications
Drawbacks of bootstrapping: high latency and high memory footprint

Polynomial Neural Architectures

How to effectively trade-off between accuracy and latency?

Our Key Insight


Approximate the

end-to-end function represented by the network

instead of the activation function.

How to optimize end-to-end polynomial neural architecture?

Multi-Objective evolutionary optimization

Joint Search for Layerwise EvoReLU and Bootstrapping Operations



Joint Search Problem Multiobjective Search
    • Flexible Architecture
    • On demand Bootstrapping

Bi-Level Multi-Objective Framework

✅ Layerwise Polynomials ✅ Diverse Solutions ✅ Better Trade-off Fronts

Experimental Setup

Dataset: CIFAR10
  • 50,000 training images
  • 10,000 test images
  • 32x32 resolution, 10 classes


Hardware & Software
  • Amazon AWS, r5.24xlarge
  • 96 CPUs, 768 GB RAM
  • Microsoft SEAL, 3.6

Latency and Accuracy Trade-offs under FHE


Data Privacy
Data and Circuit Privacy

CryptoFace

End-to-End Encrypted Face Recognition

What are the privacy and security risks in Face Recognition?


Prior Work: Template Protection

End-to-End Encrypted Face Recognition

Offline Enrollment

Online Verification

Challenges of End-to-End Encrypted Face Recognition

Image Resolution


    • Face recognition typically uses 112x112 images.
    • Encryption slot size is typically $16,384$.
      • Prior work processes 32x32 images.
      • Cannot encode and process high-resolution images.
    • Increases computational costs.
Computational Complexity
Rethink neural and homomorphic architecture designs.

Key Idea: Mixture of Shallow Patch CNNs

Homomorphic Level Optimal Block

CryptoFace Architecture

Polynomial Feature Normalization

  • L2 Normalization: $\tilde{\mathbf{y}} = \frac{\mathbf{y}}{\|\mathbf{y}\|_2} = \mathbf{y}\cdot \frac{1}{\sqrt{\sum_{i=1}^d}y_i^2}=\mathbf{y}\cdot q\left(\sqrt{\sum_{i=1}^d}y_i^2\right)$ (cannot be computed in CKKS)
  • Our Solution: Approximate $q(t)=\frac{1}{\sqrt{t}}$ using a polynomial $p(t)=\beta_2t^2 + \beta_1t + \beta_0$

Cosine Similarity Under Encryption

Experiments on Encrypted Face Datasets

Hardware & Software
  • Amazon AWS, r5.24xlarge
  • 96 CPUs, 768 GB RAM
  • Microsoft SEAL, 3.6
Approach Backbone Dataset Latency(s) Memory(GB)
Network Params Boot LFW AgeDB CALFW CPLFW CFP-FP Avg
MPCNN ResNet32 529K 31 97.02 83.02 87.00 78.90 82.07 85.60 7,367 286
ResNet44 724K 43 98.27 87.45 90.85 83.72 87.90 89.64 9,845 286
AutoFHE1 ResNet32 531K 8 93.53 80.88 85.40 75.67 77.96 82.69 4,001 286
CryptoFace PCNNs 3.78M 1 98.78 92.90 93.73 83.95 87.94 91.46 1,446 277
  1. Architecture searched on CIFAR10.

Computational Cost Per Operation

Approach Backbone Conv BN Residual AvgPool Linear Activation Norm Score Boot Other
MPCNN ResNet32 896 (12.17%) 28 (0.38%) 0.5 (0.01%) 46 (0.62%) 807 (10.96%) 583 (7.92%) 5 (0.07%) 2 (0.02%) 4991 (67.76%) 7 (0.09%)
ResNet44 1214 (12.33%) 39 (0.39%) 0.7 (0.01%) 46 (0.46%) 807 (8.19%) 807 (8.20%) 5 (0.05%) 2 (0.02%) 6917 (70.27%) 7 (0.08%)
AutoFHE$^\dagger$ ResNet32 1966 (49.13%) 28 (0.69%) 1.7 (0.04%) 38 (0.95%) 658 (16.43%) 17 (0.43%) 4 (0.10%) 1 (0.03%) 1274 (31.84%) 14 (0.34%)
CryptoFace PCNNs 858 (62.93%) 2 (0.16%) 0.1 (0.01%) 26 (1.94%) 277 (20.30%) 13 (0.94%) 2 (0.11%) 0.3 (0.02%) 141 (10.34%) 44 (3.26%)

Resolution Scalability

Data Privacy
Data and Circuit Privacy

PrivaCT

Circuit and Data Privacy

Circuit Protection: Prior Solutions

  • Inference under encryption to not share the circuit
    • Restricted two-party scenario
    • Cleartext evaluation of the circuit
Is the circuit protected?
    No, noise term can leak it!
  • Circuit leakage prevention
    • Elimination of the noise term (e.g., noise flooding, bootstrapping, etc.)
    • Randomization of the evaluated ciphertext

PrivaCT:Data and Circuit Privacy



Goal: Reduce all functions to a uniform external circuit (inner products).


Key Idea of PrivaCT: Two-Stage Function Approximation

PrivaCT improves accuracy for a fixed computational complexity

    • Existing approaches require increasing their polynomial degree to improve the approximation performance
      • Chebyshev approximation (green curves) $(deg=495)$ [BF24]
      • Minmax composite Chebyshev approximation (orange curves) $(degrees = [5,5,5,7,7,7,7,7,27])$ [LL+21]
    • PrivaCT's approximation improves as the number of segments $(n_{seg})$ increases for a fixed degree $deg=3$
      • It reaches the lowest Hausdorff distance $(HD = 0.645)$ at $n_{seg}=16$.
PrivaCT adapts with the function's subtle curvatures.

Runtime comparison between PrivaCT and state-of-the-art solutions

No loss in accuracy while achieving data and circuit privacy while significantly reducing runtime.

Story So Far...


Co-designing AI and FHE architectures is critical for efficiency.

Missing Piece in the Puzzle

Hardware Accelerators for FHE

Source: Duality Technologies

Concluding Remarks


  • Homomorphic encryption is the key to realizing end-to-end encrypted AI systems.
    • Can realize both data and function privacy.
    • Offers post-quantum security.
  • Naive application of HE for AI suffers from prohibitive computational costs.
  • Co-designing AI and Homomorphic architectures is critical for efficiency.
    • Can get at least 10x speed up.
Exciting research at the intersection of AI, Cryptography, and ASIC Design.

hal.cse.msu.edu