Interpretability
CSE 849: Deep Learning
Vishnu Boddeti
- Start from an arbitrary image
- Pick an arbitrary category
- Modify image to maximize the class score (typically via SGD)
- Stop when the network is fooled
- Forward: compute activations at chosen layer
- Set gradient of chosen layer equal to its activation
- Backward: Compute gradient on image
- Update Image