Supervised Learning Applications

CSE 891: Deep Learning

Vishnu Boddeti

Deep Learning Applications

Image Classification

Video Recognition

Object Detection

Instance Segmentation

Semantic Segmentation

Medical Image Classification

Classification: Object Recognition

Object Detection

Key ideas today:

train longer
multi-scale backbone: feature pyramid networks
bigger backbones: e.g., ResNeXt
very big models work better
big ensembles, more data, etc
test-time augmentations

Detection with CornerNet

Law and Deng, "CornerNet: Detecting Objects as Paired Keypoints", ECCV 2018

Face Matching

Feature Extraction

Face Recognition Models

Loss Functions: Embeddings

Cosine Distance: $\frac{\mathbf{x}^T\mathbf{y}}{\|x\|\|y\|}$
Triplet Loss: $\left(1+d(\mathbf{x}_i,\mathbf{x}_j)-d(\mathbf{x}_i,\mathbf{x}_k)\right)_{+}$
Mahalanobis Distance: $(\mathbf{x}-\mathbf{y})^T\mathbf{M}(\mathbf{x}-\mathbf{y})$

Angular Loss Functions

Angular Margin Losses:

Angular Softmax (SphereFace CVPR'17) $$\frac{1}{N}\sum_{i=1}^N\log\left(\frac{e^{\|\mathbf{x}_i\|cos(m\theta_{y_i,i})}}{e^{\|\mathbf{x}_i\|cos(m\theta_{y_i,i})}+\sum_{j\neq y_i}e^{\|\mathbf{x}_i\|cos(m\theta_{j,i})}}\right)$$
Additive Angular Softmax (ArcFace CVPR'19) $$-\frac{1}{N}\sum_{i=1}^N\log\left(\frac{e^{scos(\theta_{y_i}+m)}}{e^{scos(\theta_{y_i}+m)}+\sum_{j\neq y_i}e^{scos(\theta_j)}}\right)$$

Regression: Semantic Segmentation

Regression: Instance Segmentation

Regression: Super-Resolution

Bayesian Deep Learning

Model Output Uncertainty: $p(\mathbf{y}^{*}|\mathbf{x}^{*}, \mathbf{X},\mathbf{Y})$
Model Parameter Uncertainty: $p(\mathbf{w}^{*}|\mathbf{x}^{*},\mathbf{X},\mathbf{Y})$

\begin{eqnarray} p(\mathbf{y}^{*}|\mathbf{x}^{*}, \mathbf{X},\mathbf{Y}) = \int p(\mathbf{y}^{*}|\mathbf{w}^{*})p(\mathbf{w}^{*}|\mathbf{x}^{*},\mathbf{X},\mathbf{Y})d\mathbf{w}^{*} \nonumber \end{eqnarray}

3D Computer Vision

3D Representations

Depth Estimation

Depth Estimation Network

Eigen, Puhrsh, and Fergus, “Depth Map Prediction from a Single Image using a Multi-Scale Deep Network”, NeurIPS 2014
Eigen and Fergus, “Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture”, ICCV 2015
Slide Credit: Justin Johnson

Depth Estimation Loss

$L_2$ distance is scale variant

Absolute scale/depth are ambiguous from a single image

We need a scale invariant loss

3D Surface Normal Estimation

3D Surface Normal Estimation Network

Eigen and Fergus, “Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture”, ICCV 2015
Slide Credit: Justin Johnson

Pixel2Mesh

Key Ideas:

Iterative Refinement
Graph Convolution

Vertex ALigned Features
Chamfer Loss Function

Wang et al, "Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images", ECCV 2018

Neural Radiance Fields

Mildenhall et al, ”NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis”, ECCV 2020

3D Face Reconstruction from a Single Image

Rahul Dey and Vishnu Boddeti, ”Generating Diverse 3D Reconstructions from a Single Occluded Face Image,” CVPR 2022

Face Inpainting

Rahul Dey and Vishnu Boddeti, ”3DFaceFill: An Analysis-By-Synthesis Approach to Face Completion,” WACV 2022

Motion Deblurring

Purohit et al "Region-Adaptive Dense Network for Efficient Motion Deblurring" AAAI 2019

Purohit et al "Spatially-Attentive Patch-Hierarchical Network for Adaptive Motion Deblurring" CVPR 2020

Purohit et al "Spatially-Adaptive Image Restoration using Distortion-Guided Networks" ICCV 2021

Motion Deblurring Results

Video from Single Motion Blurred Image

Purohit et al "Bringing Alive Blurred Moments" CVPR 2019

Video from Single Motion Blurred Image

Purohit et al "Bringing Alive Blurred Moments" CVPR 2019