Introduction

Introduction

CSE 891: Deep Learning

Vishnu Boddeti

Wednesday September 01, 2021

Class Info

Instructor: Vishnu Boddeti
Class: MW 12:40pm - 2:00pm
Location: EGR 2245
Office Hours: MW 2:00pm - 3:00pm

drawing

Administrative Stuff

Pre-Requisites

Linear Algebra
Calculus
Probability and Statistics
Fundamentals of Machine Learning
Programming Experience in Python

Websites

Course Schedule and Lecture Slides

Use Google Classroom

questions, discussions, announcements, homework submissions

Linked to Google Drive through MSU

Communication

All communication through Google Classroom
Do NOT send emails to instructor
HW/Exam submitted via email will NOT be graded
Install Google Classroom App on phone, tablet, etc.
Turn-on notification for Google Classroom
Your responsibility to check Google Classroom regularly

Computational Resources

MSU HPCC

4hr time limit on jobs

Google Colab

12hr time limit on jobs
GPU/TPU

Personal/Lab Computers

Assignments

Written Homeworks

Short homeworks, 2-4 questions
Couple of hours worth of work

Programming Assignments

a lot of programming
hours and hours of programming
days and days of debugging

Assignments and Grading

Four Written Homeworks: 25%
Four Programming Assignments: 50%
Final Exam: 25%

Assignments and Grading

Written Homeworks: top two at 7.5% and other two at 5%
Programming Assignments: top two at 15% and other two at 10%
Generous grading policy (grad school)
Getting an A vs mastering the material
Build your CV
Take advantage of extra credit

Late Days

10% reduction of points per late day.
5 free late days total (not per assignment)
use them wisely . . . save them for assignments towards the end

Book (Optional)

Machine Learning and Neural Networks

What is Machine Learning?

For many problems, programing desired behavior by hand is difficult

recognizing people and objects
understanding human speech from audio files

Machine learning approach: program an algorithm to automatically learn from data, or from experience
Some reasons you might want to use a learning algorithm:

hard to code up a solution by hand (e.g. vision, NLP)
system needs to adapt to a changing environment (e.g. spam detection)
want the system to perform better than the human programmers
privacy/fairness (e.g. ranking search results)

Types of Machine Learning?

Supervised Learning: have labeled examples of the correct behavior, i.e. ground truth input/output response
Unsupervised Learning: no labeled examples – instead, looking for interesting patterns in the data
Reinforcement Learning: earning system receives a reward signal, tries to learn to maximize the reward signal

What are Neural Networks?

Most of the biological details aren’t essential, so we use vastly simplified models of neurons.

While neural nets originally drew inspiration from the brain, nowadays we mostly think about math, statistics, etc.

Neural networks are collections of thousands (or millions) of these simple processing units that together perform useful computations.

What are Neural Networks?

But why neural networks?

Hypothesis: Most processing in the brain may be due to a single learning algorithm.

Premise: Most of human intelligence may be due to a single learning algorithm.

Conclusion: Build learning algorithms that mimic the brain.

But Why Neural Networks Now?

Inspiration from the brain

proof of concept that a neural architecture can see and hear!

Very effective across a range of applications (vision, text, speech, medicine, robotics, etc.)
Widely used in both academia and the tech industry
Powerful software frameworks (PyTorch, TensorFlow, etc.) let us quickly implement sophisticated algorithms
Current parlance: Deep Learning

Emphasizes that the algorithms often involve hierarchies with many stages of processing

Deep Learning

Deep Learning: Where does it fit?

Deep Learning=Learning Representation/Features

Traditional model of pattern recognition: fixed/hand-engineered features + trainable classifier

End-to-End learning/feature learning/deep learning: trainable features + trainable classifier

Architectures for Pattern Recognition

Classical architectures for pattern recognition: Speech Recognition

Classical architectures for pattern recognition: Image Recognition

Deep Learning = Learning Hierarchical Representations

Deep Architecture: more than one stage of non-linear feature extraction

Trainable Feature Hierarchies: End-to-End Learning

A hierarchy of trainable feature transforms

Each module transforms its input representation into a higher-level representation.
High-level features are more global and more invariant
Low-level features are shared among categories

Deep Learning Goal: Make all modules trainable and get them to learn appropriate representations.

Deep Learning

Deep Learning: many layers (stages) of processing.
For e.g., this network recognizes objects in images,
Each box consists of many neuron-like units.

Deep Learning

You can visualize what a learned feature is responding to by finding an image that excites it. (We’ll see how to do this.)
Higher layers in the network often learn higher-level, more interpretable representations

Feature Visualization

Distributed Representations

What is a representation?

Your data representation determines what questions are easy to answer.

A dictionary of word counts is good for questions like "What is the most common word in Hamlet?"
It is not so good for semantic questions like "If Alice liked Harry Potter, will she like The Hunger Games?"

What is a representation?

Idea: represent words as vectors

What is a representation?

Mathematical relationships between vectors encode the semantic relationships between words

Measure semantic similarity using dot products
Represent a web page with the average of its word vectors
Complete analogies by doing arithmetic on word vectors

"Paris is to France, as London is to ________"
Paris - France + London = ________

Designing such representations by hand is hard, so we learn from data

This is a big part of what neural nets do, whether it is supervised, unsupervised, or reinforcement learning!

Applications of Deep Learning

Supervised Learning Examples

Supervised learning: have labeled examples of the correct behavior

E.g., handwritten digit classification with the MNIST dataset

Task: given an image of a handwritten digit, predict the digit class

Input: the image
Target: the digit class

Data: 70,000 images of handwritten digits labeled by humans

Training set: first 60,000 images, used to train the network
Test set: last 10,000 images, not available during training, used to evaluate performance

Neural nets already achieved $>$ 99% accuracy in the 1990s, but we still continue to learn a lot from it

Supervised Learning Examples

What makes a "2"?

Supervised Learning Examples

Object Recognition

(Krizhevsky and Hinton, 2012)

ImageNet dataset: 1000 categories, millions of labeled images
Lots of variability in viewpoint, lighting, etc.
Error rate dropped from 26% to under 4% over just a few years!

Supervised Learning Examples

Caption Generation

Supervised Learning Examples

Neural Machine Translation

Unsupervised Learning Examples

In generative modeling, we want to learn a distribution over some dataset, such as natural images.
We can evaluate a generative model by sampling from the model and seeing if it looks like the data.

Generated Images

Unsupervised Learning Examples

The progress of generative models:

Big GAN, Brock et al, 2019:

Unsupervised Learning Examples

Generative models of text. The models like BERT, GPT-3 perform unsupervised learning by reconstructing the next words in a sentence. The GPT-3 models learns from 499 Billion Tokens and has 175 Billion parameters.

Unsupervised Learning Examples

Recent exciting result: a model called the CycleGAN takes lots of images of one category (e.g., horses) and lots of images of another category (e.g., zebras) and learns to translate between them.

Reinforcement Learning

An agent interacts with an environment (e.g., game of Breakout)

In each time step,
The agent periodically receives a reward (e.g., points)

agent receives observations (e.g., pixels) which give it information about the state (e.g., positions of ball and paddle)
agent picks an action (e.g., keystrokes) that affects the state

The agent wants to learn a policy, or mapping from observations to actions, which maximizes its average reward over time.

Reinforcement Learning

Reinforcement Learning for Control

Software and This Course

Software Frameworks

Scientific computing (NumPy)

vectorize computations (express them in terms of matrix/vector operations) to exploit hardware efficiency

Neural network frameworks: PyTorch, TensorFlow, JAX, etc.

automatic differentiation
compiling computation graphs
braries of algorithms and network primitives
support for graphics processing units (GPUs)

For this course:

PyTorch, a widely used neural net framework with a built-in automatic differentiation feature

Software Frameworks

Why take this class, if PyTorch does so much for you?

So you know what to do if something goes wrong !!
Debugging learning algorithms requires sophisticated detective work, which requires understanding what goes on beneath the hood.
That is why we derive things by hand in this class !!

Who is this course for?

You should take the course if you want to:

Understand the fundamental concepts behind deep neural networks.
Deep dive into how deep neural networks are useful and can be adapted for machine learning.

You should not take the course if:

You do not have a background in probability, statistics and machine learning.
Your goal is to use deep learning as a black-box toolkit.
Your goal is to learn how to use deep learning packages like TensforFlow or PyTorch.

Enjoy the class and master as much as you can !!

Q & A

XKCD