Supervised Learning Applications

Supervised Learning Applications - I

CSE 891: Deep Learning

Vishnu Boddeti

Monday November 03, 2021

Deep Learning Applications

Image Classification

Video Recognition

Object Detection

Instance Segmentation

Semantic Segmentation

Medical Image Classification

Classification: Object Recognition

Object Detection

Key ideas today:

train longer
multi-scale backbone: feature pyramid networks
bigger backbones: e.g., ResNeXt
very big models work better
big ensembles, more data, etc
test-time augmentations

Detection with CornerNet

Law and Deng, "CornerNet: Detecting Objects as Paired Keypoints", ECCV 2018

Face Matching

Feature Extraction

Face Recognition Models

DeepFace
FaceNet

SphereFace
ArcFace

Loss Functions: Embeddings

Cosine Distance: $\frac{\mathbf{x}^T\mathbf{y}}{\|x\|\|y\|}$
Triplet Loss: $\left(1+d(\mathbf{x}_i,\mathbf{x}_j)-d(\mathbf{x}_i,\mathbf{x}_k)\right)_{+}$
Mahalanobis Distance: $(\mathbf{x}-\mathbf{y})^T\mathbf{M}(\mathbf{x}-\mathbf{y})$

Angular Loss Functions

Angular Margin Losses:

Angular Softmax (SphereFace CVPR'17) $$\frac{1}{N}\sum_{i=1}^N\log\left(\frac{e^{\|\mathbf{x}_i\|cos(m\theta_{y_i,i})}}{e^{\|\mathbf{x}_i\|cos(m\theta_{y_i,i})}+\sum_{j\neq y_i}e^{\|\mathbf{x}_i\|cos(m\theta_{j,i})}}\right)$$
Additive Angular Softmax (ArcFace CVPR'19) $$-\frac{1}{N}\sum_{i=1}^N\log\left(\frac{e^{scos(\theta_{y_i}+m)}}{e^{scos(\theta_{y_i}+m)}+\sum_{j\neq y_i}e^{scos(\theta_j)}}\right)$$

Regression: Semantic Segmentation

Regression: Instance Segmentation

Regression: Super-Resolution

Bayesian Deep Learning

Model Output Uncertainty: $p(\mathbf{y}^{*}|\mathbf{x}^{*}, \mathbf{X},\mathbf{Y})$
Model Parameter Uncertainty: $p(\mathbf{w}^{*}|\mathbf{x}^{*},\mathbf{X},\mathbf{Y})$

\begin{eqnarray} p(\mathbf{y}^{*}|\mathbf{x}^{*}, \mathbf{X},\mathbf{Y}) = \int p(\mathbf{y}^{*}|\mathbf{w}^{*})p(\mathbf{w}^{*}|\mathbf{x}^{*},\mathbf{X},\mathbf{Y})d\mathbf{w}^{*} \nonumber \end{eqnarray}