Introduction


CSE 849: Deep Learning

Vishnu Boddeti

Class Info

  • Instructor: Vishnu Boddeti
  • Class: MW 12:40pm - 2:00pm
  • Location: EGR 2245
  • Office Hours: MW 2:00pm - 3:00pm

Administrative Stuff

Pre-Requisites

  • Linear Algebra
  • Calculus
  • Probability and Statistics
  • Fundamentals of Machine Learning
  • Programming Experience in Python

Websites

Communication

  • All communication through Piazza
  • Do NOT send emails to instructor
  • HW/Exam submitted via email will NOT be graded
  • Install Piazza App on phone, tablet, etc.
  • Turn-on notification for Piazza
  • Your responsibility to check Piazza regularly
  • You can post privately. Use only if necessary.

Piazza Etiquette

  • Before asking a question, first check to see if it has already been answered.
  • Ask a specific, concrete question.
  • See StackOverflow guide on asking good questions.

  • Do not expect an answer within 30 minutes of posting.
  • Monday to Friday, 9am to 5pm EST will try to answer within 2 hours.
  • Other times, will try to answer within 12 hours.

Computational Resources

  • GitHub Codespaces
    • Should be your first choice. You will see this option when you open the homework.
  • MSU HPCC
    • 4hr time limit on jobs
  • Google Colab
    • 12hr time limit on jobs
    • GPU/TPU
  • Personal/Lab Computers

Assignments

  • Written Homeworks
    • Short homeworks, 2-4 questions
    • Couple of hours worth of work
  • Programming Assignments
    • a lot of programming
    • hours and hours of programming
    • days and days of debugging

Course Project

  • Teams of 1 to 3 members each (3 max).
  • An intermediate report is required.
  • Project presentation at the end of the semester.
  • Deliverable: code and any notebooks that walks through your implementation and main results.
  • Will provide project suggestions, pick one from it. If you want to work on your own project, contact instructor.
  • More details will be provided later.

Collaboration Policy

    • Do not look at any external solutions or code. Everything you submit should be your own work.
    • Do not share solutions to any other students. Discussing ideas with each other is ok and encouraged.
    • If you worked with someone, mention their name in your submission.

Grading (Current Plan)

  • One Written Homework: 25%
  • One Programming Homework: 25%
  • Mid-Term Exam: 25%
  • Course Project: 25%
  • Three Written Homeworks: not graded, optional
  • Three Programming Homeworks: not graded, optional

Assignments and Grading

  • Generous grading policy (grad school)
  • Getting an A vs mastering the material
  • Build your CV
  • Take advantage of extra credit

Late Days

  • 10% reduction of points per late day.
  • 3 free late days total (not per assignment)

Book (Optional)

Machine Learning and Neural Networks

What is Machine Learning?

  • For many problems, programing desired behavior by hand is difficult
    • recognizing people and objects
    • understanding human speech from audio files
  • Machine learning approach: program an algorithm to automatically learn from data, or from experience
  • Some reasons you might want to use a learning algorithm:
    • hard to code up a solution by hand (e.g. vision, NLP)
    • system needs to adapt to a changing environment (e.g. spam detection)
    • want the system to perform better than the human programmers
    • privacy/fairness (e.g. ranking search results)

Types of Machine Learning?

  • Supervised Learning: have labeled examples of the correct behavior, i.e. ground truth input/output response
  • Unsupervised Learning: no labeled examples – instead, looking for interesting patterns in the data
  • Reinforcement Learning: earning system receives a reward signal, tries to learn to maximize the reward signal

What are Neural Networks?

  • Most of the biological details aren’t essential, so we use vastly simplified models of neurons.

  • While neural nets originally drew inspiration from the brain, nowadays we mostly think about math, statistics, etc.

  • Neural networks are collections of thousands (or millions) of these simple processing units that together perform useful computations.

What are Neural Networks?

But why neural networks?

  • Hypothesis: Most processing in the brain may be due to a single learning algorithm.

  • Premise: Most of human intelligence may be due to a single learning algorithm.

  • Conclusion: Build learning algorithms that mimic the brain.

But Why Neural Networks Now?

  • Inspiration from the brain
    • proof of concept that a neural architecture can see and hear!
  • Very effective across a range of applications (vision, text, speech, medicine, robotics, etc.)
  • Resources and efforts from large corporations.
  • Tools and culture of collaborative and reproducible science.
  • Powerful software frameworks (PyTorch, TensorFlow, etc.) let us quickly implement sophisticated algorithms.
  • Emphasizes that the algorithms often involve hierarchies with many stages of processing.

But Why Neural Networks Now?

Deep Learning

Deep Learning: Where does it fit?

Deep Learning=Learning Representation/Features

  • Traditional model of pattern recognition: fixed/hand-engineered features + trainable classifier
  • End-to-End learning/feature learning/deep learning: trainable features + trainable classifier

Architectures for Pattern Recognition

  • Classical architectures for pattern recognition: Speech Recognition
  • Classical architectures for pattern recognition: Image Recognition

Deep Learning = Learning Hierarchical Representations

  • Deep Architecture: more than one stage of non-linear feature extraction

Trainable Feature Hierarchies: End-to-End Learning

  • A hierarchy of trainable feature transforms
    • Each module transforms its input representation into a higher-level representation.
    • High-level features are more global and more invariant
    • Low-level features are shared among categories
  • Deep Learning Goal: Make all modules trainable and get them to learn appropriate representations.

Deep Learning

  • Deep Learning: many layers (stages) of processing.
  • For e.g., this network recognizes objects in images,
  • Each box consists of many neuron-like units.

Deep Learning

  • You can visualize what a learned feature is responding to by finding an image that excites it. (We’ll see how to do this.)
  • Higher layers in the network often learn higher-level, more interpretable representations
Image
Feature Visualization

Distributed Representations

What is a representation?

  • Your data representation determines what questions are easy to answer.
    • A dictionary of word counts is good for questions like "What is the most common word in Hamlet?"
    • It is not so good for semantic questions like "If Alice liked Harry Potter, will she like The Hunger Games?"

What is a representation?

Idea: represent words as vectors

What is a representation?

  • Mathematical relationships between vectors encode the semantic relationships between words
    • Measure semantic similarity using dot products
    • Represent a web page with the average of its word vectors
    • Complete analogies by doing arithmetic on word vectors
      • "Paris is to France, as London is to ________"
      • Paris - France + London = ________
  • Designing such representations by hand is hard, so we learn from data
    • This is a big part of what neural nets do, whether it is supervised, unsupervised, or reinforcement learning!

Applications of Deep Learning

Supervised Learning Examples

  • Supervised learning: have labeled examples of the correct behavior
    • E.g., handwritten digit classification with the MNIST dataset
  • Task: given an image of a handwritten digit, predict the digit class
    • Input: the image
    • Target: the digit class
  • Data: 70,000 images of handwritten digits labeled by humans
    • Training set: first 60,000 images, used to train the network
    • Test set: last 10,000 images, not available during training, used to evaluate performance
  • Neural nets already achieved $>$ 99% accuracy in the 1990s, but we still continue to learn a lot from it

Supervised Learning Examples

Image
What makes a "2"?

Supervised Learning Examples

  • Object Recognition
Image
(Krizhevsky and Hinton, 2012)
  • ImageNet dataset: 1000 categories, millions of labeled images
  • Lots of variability in viewpoint, lighting, etc.
  • Error rate dropped from 26% to under 4% over just a few years!

Supervised Learning Examples

Image
Caption Generation

Supervised Learning Examples

Image
Neural Machine Translation

Unsupervised Learning Examples

  • In generative modeling, we want to learn a distribution over some dataset, such as natural images.
  • We can evaluate a generative model by sampling from the model and seeing if it looks like the data.
Image
Generated Images

Unsupervised Learning Examples

  • The progress of generative models:
Image
  • Big GAN, Brock et al, 2019:
Image

Unsupervised Learning Examples

  • Generative models of text. The models like BERT, GPT-3 perform unsupervised learning by reconstructing the next words in a sentence. The GPT-3 models learns from 499 Billion Tokens and has 175 Billion parameters.
Image

Unsupervised Learning Examples

  • Recent exciting result: models are being developed to generate realistic images from text prompts.
Image

Unsupervised Learning Examples

  • Recent exciting result: models are being developed to generate realistic images from text prompts.
Image

Reinforcement Learning

  • An agent interacts with an environment (e.g., game of Breakout)
Image
  • In each time step,
  • The agent periodically receives a reward (e.g., points)
    • agent receives observations (e.g., pixels) which give it information about the state (e.g., positions of ball and paddle)
    • agent picks an action (e.g., keystrokes) that affects the state
  • The agent wants to learn a policy, or mapping from observations to actions, which maximizes its average reward over time.

Reinforcement Learning

Reinforcement Learning for Control

Software and This Course

Software Frameworks

  • Scientific computing (NumPy)
    • vectorize computations (express them in terms of matrix/vector operations) to exploit hardware efficiency
  • Neural network frameworks: PyTorch, TensorFlow, JAX, etc.
    • automatic differentiation
    • compiling computation graphs
    • libraries of algorithms and network primitives
    • support for graphics processing units (GPUs)
  • For this course:
    • PyTorch, a widely used neural net framework with a built-in automatic differentiation feature

Software Frameworks

  • Why take this class, if PyTorch does so much for you?
    • So you know what to do if something goes wrong !!
    • Debugging learning algorithms requires sophisticated detective work, which requires understanding what goes on beneath the hood.
    • That is why we derive things by hand in this class !!

Who is this course for?

  • You should take the course if you want to:
    • Understand the fundamental concepts behind deep neural networks.
    • Deep dive into how deep neural networks are useful and can be adapted for machine learning.
  • You should not take the course if:
    • You do not have a background in probability, statistics or machine learning.
    • Your goal is to use deep learning as a black-box toolkit.
    • Your goal is to learn how to use deep learning packages like TensforFlow or PyTorch.

Enjoy the class and master as much as you can !!

Q & A

Image
XKCD