Introduction


CSE 891: Deep Learning

Vishnu Boddeti

Wednesday September 01, 2021

Class Info

  • Instructor: Vishnu Boddeti
  • Class: MW 12:40pm - 2:00pm
  • Location: EGR 2245
  • Office Hours: MW 2:00pm - 3:00pm

Administrative Stuff

Pre-Requisites

  • Linear Algebra
  • Calculus
  • Probability and Statistics
  • Fundamentals of Machine Learning
  • Programming Experience in Python

Websites

Communication

  • All communication through Google Classroom
  • Do NOT send emails to instructor
  • HW/Exam submitted via email will NOT be graded
  • Install Google Classroom App on phone, tablet, etc.
  • Turn-on notification for Google Classroom
  • Your responsibility to check Google Classroom regularly

Computational Resources

  • MSU HPCC
    • 4hr time limit on jobs
  • Google Colab
    • 12hr time limit on jobs
    • GPU/TPU
  • Personal/Lab Computers

Assignments

  • Written Homeworks
    • Short homeworks, 2-4 questions
    • Couple of hours worth of work
  • Programming Assignments
    • a lot of programming
    • hours and hours of programming
    • days and days of debugging

Assignments and Grading

  • Four Written Homeworks: 25%
  • Four Programming Assignments: 50%
  • Final Exam: 25%

Assignments and Grading

  • Written Homeworks: top two at 7.5% and other two at 5%
  • Programming Assignments: top two at 15% and other two at 10%
  • Generous grading policy (grad school)
  • Getting an A vs mastering the material
  • Build your CV
  • Take advantage of extra credit

Late Days

  • 10% reduction of points per late day.
  • 5 free late days total (not per assignment)
  • use them wisely . . . save them for assignments towards the end

Book (Optional)

Machine Learning and Neural Networks

What is Machine Learning?

  • For many problems, programing desired behavior by hand is difficult
    • recognizing people and objects
    • understanding human speech from audio files
  • Machine learning approach: program an algorithm to automatically learn from data, or from experience
  • Some reasons you might want to use a learning algorithm:
    • hard to code up a solution by hand (e.g. vision, NLP)
    • system needs to adapt to a changing environment (e.g. spam detection)
    • want the system to perform better than the human programmers
    • privacy/fairness (e.g. ranking search results)

Types of Machine Learning?

  • Supervised Learning: have labeled examples of the correct behavior, i.e. ground truth input/output response
  • Unsupervised Learning: no labeled examples – instead, looking for interesting patterns in the data
  • Reinforcement Learning: earning system receives a reward signal, tries to learn to maximize the reward signal

What are Neural Networks?

  • Most of the biological details aren’t essential, so we use vastly simplified models of neurons.

  • While neural nets originally drew inspiration from the brain, nowadays we mostly think about math, statistics, etc.

  • Neural networks are collections of thousands (or millions) of these simple processing units that together perform useful computations.

What are Neural Networks?

But why neural networks?

  • Hypothesis: Most processing in the brain may be due to a single learning algorithm.

  • Premise: Most of human intelligence may be due to a single learning algorithm.

  • Conclusion: Build learning algorithms that mimic the brain.

But Why Neural Networks Now?

  • Inspiration from the brain
    • proof of concept that a neural architecture can see and hear!
  • Very effective across a range of applications (vision, text, speech, medicine, robotics, etc.)
  • Widely used in both academia and the tech industry
  • Powerful software frameworks (PyTorch, TensorFlow, etc.) let us quickly implement sophisticated algorithms
  • Current parlance: Deep Learning
    • Emphasizes that the algorithms often involve hierarchies with many stages of processing

Deep Learning

Deep Learning: Where does it fit?

Deep Learning=Learning Representation/Features

  • Traditional model of pattern recognition: fixed/hand-engineered features + trainable classifier
  • End-to-End learning/feature learning/deep learning: trainable features + trainable classifier

Architectures for Pattern Recognition

  • Classical architectures for pattern recognition: Speech Recognition
  • Classical architectures for pattern recognition: Image Recognition

Deep Learning = Learning Hierarchical Representations

  • Deep Architecture: more than one stage of non-linear feature extraction

Trainable Feature Hierarchies: End-to-End Learning

  • A hierarchy of trainable feature transforms
    • Each module transforms its input representation into a higher-level representation.
    • High-level features are more global and more invariant
    • Low-level features are shared among categories
  • Deep Learning Goal: Make all modules trainable and get them to learn appropriate representations.

Deep Learning

  • Deep Learning: many layers (stages) of processing.
  • For e.g., this network recognizes objects in images,
  • Each box consists of many neuron-like units.

Deep Learning

  • You can visualize what a learned feature is responding to by finding an image that excites it. (We’ll see how to do this.)
  • Higher layers in the network often learn higher-level, more interpretable representations
Image
Feature Visualization

Distributed Representations

What is a representation?

  • Your data representation determines what questions are easy to answer.
    • A dictionary of word counts is good for questions like "What is the most common word in Hamlet?"
    • It is not so good for semantic questions like "If Alice liked Harry Potter, will she like The Hunger Games?"

What is a representation?

Idea: represent words as vectors

What is a representation?

  • Mathematical relationships between vectors encode the semantic relationships between words
    • Measure semantic similarity using dot products
    • Represent a web page with the average of its word vectors
    • Complete analogies by doing arithmetic on word vectors
      • "Paris is to France, as London is to ________"
      • Paris - France + London = ________
  • Designing such representations by hand is hard, so we learn from data
    • This is a big part of what neural nets do, whether it is supervised, unsupervised, or reinforcement learning!

Applications of Deep Learning

Supervised Learning Examples

  • Supervised learning: have labeled examples of the correct behavior
    • E.g., handwritten digit classification with the MNIST dataset
  • Task: given an image of a handwritten digit, predict the digit class
    • Input: the image
    • Target: the digit class
  • Data: 70,000 images of handwritten digits labeled by humans
    • Training set: first 60,000 images, used to train the network
    • Test set: last 10,000 images, not available during training, used to evaluate performance
  • Neural nets already achieved $>$ 99% accuracy in the 1990s, but we still continue to learn a lot from it

Supervised Learning Examples

Image
What makes a "2"?

Supervised Learning Examples

  • Object Recognition
Image
(Krizhevsky and Hinton, 2012)
  • ImageNet dataset: 1000 categories, millions of labeled images
  • Lots of variability in viewpoint, lighting, etc.
  • Error rate dropped from 26% to under 4% over just a few years!

Supervised Learning Examples

Image
Caption Generation

Supervised Learning Examples

Image
Neural Machine Translation

Unsupervised Learning Examples

  • In generative modeling, we want to learn a distribution over some dataset, such as natural images.
  • We can evaluate a generative model by sampling from the model and seeing if it looks like the data.
Image
Generated Images

Unsupervised Learning Examples

  • The progress of generative models:
Image
  • Big GAN, Brock et al, 2019:
Image

Unsupervised Learning Examples

  • Generative models of text. The models like BERT, GPT-3 perform unsupervised learning by reconstructing the next words in a sentence. The GPT-3 models learns from 499 Billion Tokens and has 175 Billion parameters.
Image

Unsupervised Learning Examples

  • Recent exciting result: a model called the CycleGAN takes lots of images of one category (e.g., horses) and lots of images of another category (e.g., zebras) and learns to translate between them.
Image

Reinforcement Learning

  • An agent interacts with an environment (e.g., game of Breakout)
Image
  • In each time step,
  • The agent periodically receives a reward (e.g., points)
    • agent receives observations (e.g., pixels) which give it information about the state (e.g., positions of ball and paddle)
    • agent picks an action (e.g., keystrokes) that affects the state
  • The agent wants to learn a policy, or mapping from observations to actions, which maximizes its average reward over time.

Reinforcement Learning

Reinforcement Learning for Control

Software and This Course

Software Frameworks

  • Scientific computing (NumPy)
    • vectorize computations (express them in terms of matrix/vector operations) to exploit hardware efficiency
  • Neural network frameworks: PyTorch, TensorFlow, JAX, etc.
    • automatic differentiation
    • compiling computation graphs
    • braries of algorithms and network primitives
    • support for graphics processing units (GPUs)
  • For this course:
    • PyTorch, a widely used neural net framework with a built-in automatic differentiation feature

Software Frameworks

  • Why take this class, if PyTorch does so much for you?
    • So you know what to do if something goes wrong !!
    • Debugging learning algorithms requires sophisticated detective work, which requires understanding what goes on beneath the hood.
    • That is why we derive things by hand in this class !!

Who is this course for?

  • You should take the course if you want to:
    • Understand the fundamental concepts behind deep neural networks.
    • Deep dive into how deep neural networks are useful and can be adapted for machine learning.
  • You should not take the course if:
    • You do not have a background in probability, statistics and machine learning.
    • Your goal is to use deep learning as a black-box toolkit.
    • Your goal is to learn how to use deep learning packages like TensforFlow or PyTorch.

Enjoy the class and master as much as you can !!

Q & A

Image
XKCD