Supervised Learning Applications - I
CSE 891: Deep Learning
Vishnu Boddeti
Monday November 03, 2021
Deep Learning Applications
Image Classification
Video Recognition
Object Detection
Instance Segmentation
Semantic Segmentation
Medical Image Classification
Classification: Object Recognition
Object Detection
Object Detection
Object Detection
- Key ideas today:
- train longer
- multi-scale backbone: feature pyramid networks
- bigger backbones: e.g., ResNeXt
- very big models work better
- big ensembles, more data, etc
- test-time augmentations
Detection with CornerNet
- Law and Deng, "CornerNet: Detecting Objects as Paired Keypoints", ECCV 2018
Face Matching
Feature Extraction
Loss Functions: Embeddings
- Cosine Distance: $\frac{\mathbf{x}^T\mathbf{y}}{\|x\|\|y\|}$
- Triplet Loss: $\left(1+d(\mathbf{x}_i,\mathbf{x}_j)-d(\mathbf{x}_i,\mathbf{x}_k)\right)_{+}$
- Mahalanobis Distance: $(\mathbf{x}-\mathbf{y})^T\mathbf{M}(\mathbf{x}-\mathbf{y})$
Angular Loss Functions
- Angular Margin Losses:
- Angular Softmax (SphereFace CVPR'17)
$$\frac{1}{N}\sum_{i=1}^N\log\left(\frac{e^{\|\mathbf{x}_i\|cos(m\theta_{y_i,i})}}{e^{\|\mathbf{x}_i\|cos(m\theta_{y_i,i})}+\sum_{j\neq y_i}e^{\|\mathbf{x}_i\|cos(m\theta_{j,i})}}\right)$$
- Additive Angular Softmax (ArcFace CVPR'19)
$$-\frac{1}{N}\sum_{i=1}^N\log\left(\frac{e^{scos(\theta_{y_i}+m)}}{e^{scos(\theta_{y_i}+m)}+\sum_{j\neq y_i}e^{scos(\theta_j)}}\right)$$
Regression: Semantic Segmentation
Regression: Semantic Segmentation
Regression: Instance Segmentation
Regression: Super-Resolution
Bayesian Deep Learning
- Model Output Uncertainty: $p(\mathbf{y}^{*}|\mathbf{x}^{*}, \mathbf{X},\mathbf{Y})$
- Model Parameter Uncertainty: $p(\mathbf{w}^{*}|\mathbf{x}^{*},\mathbf{X},\mathbf{Y})$
\begin{eqnarray}
p(\mathbf{y}^{*}|\mathbf{x}^{*}, \mathbf{X},\mathbf{Y}) = \int p(\mathbf{y}^{*}|\mathbf{w}^{*})p(\mathbf{w}^{*}|\mathbf{x}^{*},\mathbf{X},\mathbf{Y})d\mathbf{w}^{*} \nonumber
\end{eqnarray}