Supervised Learning Applications - I
CSE 891: Deep Learning
Vishnu Boddeti
Monday November 02, 2020
Deep Learning Applications
Image Classification
Video Recognition
Object Detection
Instance Segmentation
Semantic Segmentation
Medical Image Classification
Classification: Object Recognition
Object Detection
Object Detection
Object Detection
- Key ideas today:
- train longer
- multi-scale backbone: feature pyramid networks
- bigger backbones: e.g., ResNeXt
- very big models work better
- big ensembles, more data, etc
- test-time augmentations
Face Matching
Feature Extraction
Regression: Semantic Segmentation
Regression: Semantic Segmentation
Regression: Super-Resolution
Bayesian Deep Learning
- Model Output Uncertainty: $p(\mathbf{y}^{*}|\mathbf{x}^{*}, \mathbf{X},\mathbf{Y})$
- Model Parameter Uncertainty: $p(\mathbf{w}^{*}|\mathbf{x}^{*},\mathbf{X},\mathbf{Y})$
\begin{eqnarray}
p(\mathbf{y}^{*}|\mathbf{x}^{*}, \mathbf{X},\mathbf{Y}) = \int p(\mathbf{y}^{*}|\mathbf{w}^{*})p(\mathbf{w}^{*}|\mathbf{x}^{*},\mathbf{X},\mathbf{Y})d\mathbf{w}^{*} \nonumber
\end{eqnarray}
Attention for Memory: Neural Turing Machines
NTM: Memory Read and Write
- (Blurry) Read: Read everywhere with weights
\[r_t = \sum_{i} w_t(i)\mathbf{M}_t(i)\]
- (Blurry) Write: Erase and add everywhere with weights
\[\mathbf{M}_t(i) \leftarrow w_t(i)\mathbf{a}_t + M_{t-1}(i)(1-w_t(i)\mathbf{e}_t)\]
NTM: Memory Addressing
NTM: Memory Addressing
- Content Addressing:
\[w^c_t(i) = \frac{\exp(\beta_tK[\mathbf{k}_t,\mathbf{M})t(i)])}{\sum_j \exp(\beta_tK[\mathbf{k}_t,\mathbf{M})t(j)])} \]
- Interpolation:
\[\mathbf{w}_t^g = g_t\mathbf{w}^c_t+(1-g_t)\mathbf{w}_{t-1}\]
- Convolutional Shifting:
\[\tilde{w}_t(i) \leftarrow \sum_{j=0}^{N-1}w^g_t(j)s_t(i-j)\]
- Sharpening:
\[w_t(i) \leftarrow \frac{\tilde{w}_t(i)^{\gamma_t}}{\sum_j \tilde{w}_t(j)^{\gamma_t}}\]
NTM: Copy Performance
NTM: Copy Comparison
- LSTM:
- NTM:
NTM: Read Write
Regression: Object Alignment
Regression: Object Alignment
Structured Output Prediction
- Traditional Learning: Mapping $f : \mathcal{X} \rightarrow \mathbb{R}$
- Structured Output Learning: Mapping $f : \mathcal{X} \rightarrow \mathcal{Y}$
Semantic Segmentaion
Image Annotations
Human Pose Estimation