Deep Learning

Course Project

General Instructions

Teams: Form teams of 1-3 students.

Submission: You should submit a project proposal by 26th October, 2022 on GitHub Classroom. The proposal will not be graded, but will need approval.

You final project submission needs the following:

A final report with text and figures, you can simply edit the project proposal document. Submit this as a README.md file. Submit this via GitHub Classroom.
Code and instruction to execute it. Make sure the code works with at least a toy dataset if not the original full dataset. Submit this via GitHub Classroom.
Project presentation (10-15 mins) towards the end of the course.

Project Proposal Write-Up

You can either propose your own project or choose from among the suggestions provided below. In both cases you need to submit a write-up of 1-2 pages describing the following:

Title of project and names of team members.
What problem do you want to work on?
What are the existing works that are most relevant to your problem?
What dataset will be used for training and evaluation? Provide detailed description.
What are the metrics for evaluation? How will you compare different solutions or know if your model is working?
What computational resources will you use? GitHub Codespaces, Colab, MSU HPCC or other resources?

1. Test-Time Adaptive Inference

In the task of test-time adaptive inference, the parameters of the your neural network are custom generated for the input data on the fly. Specifically, for each layer in the neural network a parameter generating function (hypernetwork) takes the activation map as input and generates the weights for processing the activation map in the next layer. An illustration of this concept is shown in the figure below.

flowchart LR input --> Conv input --> HyperNetwork --> W,D,b --> Conv

The goal of this project is to implement a test-time adaptive model and reproduce a specific variation of this idea describe below. You may optionally also incorporate the variation on the idea suggested below. This project is intended to be somewhat open-ended; while the goal is to re-implement deformable convolution, you can be creative in exactly what results you show, and how you deviate from the original deformable convolution.

In deformable convolution, convolution is not performed on a regular grid, but over an irregular grid which is dependent on the input activation to the convolution layer. And, in another variant of this idea, a modulation (scale) of the weights is also dependent on the input activation to the convolution layer.
Optional variation: In this variation we will combine the basic idea of deformable convolution with a non-learnable filters.

\[y[p] = \sum_{k=1}^K w_k\cdot x(p + \Delta d_k)\]

where $\mathbf{w} \in \mathbb{R}^K$ is a $K$-dimensional filter, $p_k$ is the deformed grid for convolution, $\mathbf{x}$ and $\mathbf{y}$ are the input and output activation of the convolution layer. The convolution weights $\mathbf{w}$ and the deformation $\Delta\mathbf{d}$ are prediction from the input activation function as,

\[\begin{aligned} \mathbf{\Delta d} &= f_d(\mathbf{x}) \\ \mathbf{\alpha} &= f_{\alpha}(\mathbf{x}) \\ \mathbf{w} &= \mathbf{D}\mathbf{\alpha} \\ \end{aligned}\]

where $f_p(\cdot)$ and $f_{\alpha}(\cdot)$ are shallow neural networks (typically 1-2 layers deep), the convolutional weights are expressed a linear combination of a basis weights $\mathbf{D} \in \mathbb{R}^{K \times m}$, where $m$ is the number of dictionary elements. The dictionary $\mathbf{D}$ is randomly generated at initialization, and kept fixed i.e., it is not updated during backpropagation. Furthermore the dictionary can be shared across all convolution layers in the network.

Optionally, the convolution weights and the deformation field can be location specific by setting,

\[\begin{aligned} \mathbf{\Delta d}_p &= f_d(\mathbf{x}, p) \\ \alpha_p &= f_{\alpha}(\mathbf{x},p) \end{aligned}\]

This can be easily implemented by modeling $f_d$ and $f_{\alpha}$ through convolutional layers. This variation of deformable convolution allows the network to adapt to the specific input image. If this variation can be successfully implemented and evaluated, it could lead to a publication.

Suggested Papers to Reads:

2. Neural Ordinary Differential Equations

NeuralODEs are a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, NeuralODEs are continuous-depth neural networks. They have a constant memory footprint, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed.

A NeuralODE models rate of change of functions as opposed to the function itself.

\[\frac{dy(t)}{dt} = f(y(t);\mathbf{\theta})\]

where $f$ is represented as a neural network with parameters $\mathbf{\theta}$.

The goal of this project is to implement a NeuralODE model and reproduce some of the main results from the original NeuralODE paper. You may optionally also try other variations (NeuralSI) of incorporating neural networks into different equations as listed below. This project is intended to be somewhat open-ended; while the goal is to re-implement NeuralODE, you can be creative in exactly what results you show, and how you deviate from the original NeuralODE paper.

Suggested Papers to Reads:

3. Imparting Fairness to Language Embeddings

Over the past decade a plethora of image and language representations have been trained on large-scale datasets. These pre-trained models form the basis of many computer vision and natural language processing applications. There is ample evidence that these representations are biased towards different demographic groups.

The goal of this project is to implement techniques to mitigate bias in pre-trained representations. Many methods have been proposed for this purpose (a few are listed below). This project is intended to be somewhat open-ended; you can be creative in what methods you try for mitigating bias in pre-trained representations.

Suggested Papers to Reads:

4. Deep Equillibrium Networks

Deep equilibrium models (DEQ) are a new approach to modeling sequential data. It is an implicit-depth architecture that directly solves for and backpropagtes through the (fixed-point) equilibrium state of an (effectively) infinitely deep network.

A typical $k$-layer deep network $h:X \rightarrow Y$ is defined by a stack of layers that looks something like the following

\[\begin{aligned} z_1 &= x \\ z_{i+1} &= \sigma(W_iz_i + b_i), i=1,\dots,k-1 \\ h(x) &= W_kz_k +b_k \end{aligned}\]

To be clear, “real” deep networks have forms that are quite different, with convolutional layers, residual connections, normalizations, attention layers, etc. But this is nonetheless instructive to start with a simple network like this. We could draw this network graphically like the following:

Deep equilibrium models are an alternative formulation that tie the weights of all the layers in the network. These models also inject the input $x$ into every layer. The model can be mathematically expressed as,

\[\begin{aligned} z_1 &= 0 \\ z_{i+1} &= \sigma(Wz_i + Ux + b_i), i=1,\dots,k-1 \\ h(x) &= W_kz_k +b_k \end{aligned}\]

A pictorial illustration of these networks is shown below.

The goal of this project is to implement a DEQ model and reproduce some of the main results from the original DEQ paper. You may optionally also incorporate ideas from some followup papers. This project is intended to be somewhat open-ended; while the goal is to re-implement DEQ, you can be creative in exactly what results you show, and how you deviate from the original DEQ paper.

Suggested Papers to Reads:

Other Resources:

Deep Implicit Layers Tutorial

5. Novel View Synthesis with NeRF

In the task of novel view synthesis, your training set consists of a set of images of a scene where you know the camera parameters (intrinsic and extrinsic) for each image. Your goal is to build a model that can synthesize images showing the scene from new viewpoints unseen in the training set.

Over the past few years, Neural Radiance Fields (NeRFs) have emerged as a simple and powerful model for this problem. NeRFs rely on the idea of volume rendering: to determine the color of a pixel, we shoot a ray originating from the camera center through the pixel and into the scene; for a set of points along the ray, we compute both the color and opacity of the 3D scene. Integrating the influence of these points gives the color of the pixel. The original NeRF paper Mildenhall et al, ECCV 2020 proposed to train a fully-connected neural network that inputs (x, y, z) and a viewing direction, and outputs the RGB color and opacity of the 3D scene at that point. This network is trained to reproduce the pixel values of the training images; during inference, the network can be used to synthesize the color of pixels in novel views unseen during training.

The goal of this project is to implement a NeRF model and reproduce some of the main results from the original NeRF paper. You may optionally also incorporate ideas from some followup papers. This project is intended to be somewhat open-ended; while the goal is to re-implement NeRF, you can be creative in exactly what results you show, and how you deviate from the original NeRF paper.

Suggested Papers to Reads:

Course Project

General Instructions

Project Proposal Write-Up

Suggested Topics

1. Test-Time Adaptive Inference

2. Neural Ordinary Differential Equations

3. Imparting Fairness to Language Embeddings

4. Deep Equillibrium Networks

5. Novel View Synthesis with NeRF