Feed Forward Networks: Introduction


CSE 891: Deep Learning

Vishnu Boddeti

Wednesday September 08, 2021

Today

  • Artificial Neuron
  • Activation Functions
  • Capacity of Neural Networks
  • Biological Motivation

Simplest Neural Network

Artificial Neuron

  • Neuron pre-activation (or input activation)
  • $a(\mathbf{x}) = b + \sum_i w_ix_i = b + \mathbf{w}^T\mathbf{x}$
  • Neuron (output) activation
  • $h(\mathbf{x}) = g(a(\mathbf{x})) = g\left(b+\sum_iw_ix_i\right)$
  • $\mathbf{w}$ are the connection weights
  • $b$ is the neuron bias
  • $g(\cdot)$ is called activation function

Artificial Neuron

  • range determined by $g(\cdot)$
  • bias $b$ only changes the position of the riff

Linear Activation

$g(x)=x$
  • Performs no input squashing
  • Quite a boring function...

Sigmoid Activation

$g(x)=\frac{1}{1+e^{-x}}$
  • Squashes the neuron's pre-activation between 0 and 1
  • Always positive
  • Bounded
  • Strictly increasing

Tanh Activation

$g(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}$
  • Squashes the neuron's pre-activation between -1 and 1
  • Can be positive or negative
  • Bounded
  • Strictly increasing

Rectified Linear Unit Activation

$g(x)=max(0,x)$
  • Bounded below by 0 (always non-negative)
  • Not upper bounded
  • Strictly increasing
  • Tends to yeild neurons with sparse activities

Capcity of Neural Networks

Single Neuron

  • Could do binary classification:
    • with sigmoid, can interpret neuron as estimating $p(y=1|\mathbf{x})$
    • also known as logistic regression classifier
    • if greater than 0.5, predict class 1
    • otherwise, predict class 0
    • similar idea can be used with Tanh

Capacity of a Single Neuron

  • Can solve linearly seperable problems

Capacity of a Single Neuron

  • Cannot solve non-linearly separable problems....
  • ...unless the input is transformed in a better representation

Neural Network with Hidden Layer

  • Hidden layer pre-activation:
  • $\mathbf{a}(\mathbf{x}) = \mathbf{b}_1 + \mathbf{W}_1\mathbf{x}$ $\left(a(\mathbf{x})^i = \mathbf{b}^i_1 + \sum_{j}W^{i,j}_1x^j\right)$
  • Hidden layer activation:
  • $\mathbf{h}(\mathbf{x}) = \mathbf{g}(\mathbf{a}(\mathbf{x}))$
  • Output layer activation:
  • $f(\mathbf{x}) = o(b_2 + \mathbf{w}_2\mathbf{h}(\mathbf{x}))$

Softmax Activation Function

  • For multi-class classification:
    • we need multiple outputs (1 output per class)
    • we would like to estimate the conditional probability $p(y=c|\mathbf{x})$
  • Softmax activation function at the output:
  • $\mathbf{o}(\mathbf{a}) = \textrm{softmax}(\mathbf{a}) = \left[\frac{\exp{a_1}}{\sum_c\exp{a_c}},\dots,\frac{\exp{a_C}}{\sum_c\exp{a_c}}\right]^T$
    • strictly positive
    • sums to one
  • Predicted class: one with highest estimated probability

Multi-Layer Neural Network

  • Could have $L$ hidden layers:
    • layer pre-activation for $k>0$ ($\mathbf{h}^{(0)}(\mathbf{x})=\mathbf{x}$)
    • $\mathbf{a}^{(k)}(\mathbf{x}) = \mathbf{b}^{(k)} + \mathbf{W}^{(k)}\mathbf{h}^{(k-1)}(\mathbf{x})$
    • hidden layer activation ($k$ from 1 to $L$):
    • $\mathbf{h}^{(k)}(\mathbf{x}) = \mathbf{g}(\mathbf{a}^{(k)}(\mathbf{x}))$
    • output layer activation ($k=L+1$):
    • $\mathbf{h}^{(L+1)}(\mathbf{x}) = \mathbf{o}(\mathbf{a}^{(L+1)}(\mathbf{x})) = f(\mathbf{x})$

Capacity of Single Hidden Layer Neural Network

Universal Approximation

  • Universal approximation theorem (Hornik, 1991):
    • "a single hidden layer neural network with a linear output unit can approximate any continuous function arbitrarily well, given enough hidden units"
  • The result applies for sigmoid, tanh and many other hidden layer activation functions.
  • This is a good result, but it doesn’t mean there is a learning algorithm that can find the necessary parameter values.
  • Many other function classes also known to be universal approximators.

Biological Inspiration

Human Brain: Visual Cortex

Biological Neurons

  • Human brain is estimated to have around $10^{10}$-$10^{11}$ neurons:
    • Dendrites: receive information from neurons
    • Soma: "process" information inside cell body
    • Axon: "cable" to send information to neurons
    • Synapses: connection between axons and dendrites

Biological Neurons

How do Biological Neurons Work?

  • Action Potential: electrical impulse that travels through the axon.
    • communication between neurons
    • generates "spike" in the electric potential (voltage) of the axon
    • action potential is generated at neuron when it receives enough (more than a threshold) of the “right” pattern of spikes from other neurons.
  • Neurons can generate several such spikes every second:
    • firing rate: frequency of spikes, characterizes activity of neuron
    • neurons are always firing a little bit (spontaneous firing rate), but will fire more given the right stimulus.

The Connexion

  • Neuron firing rate: influenced by firing rate of input neurons:
    • "excite": increase the firing rate
    • "inhibit": decrease the firing rate
  • Artificial Neuron Approximation:
    • activation corresponds to "sort of" firing rate
    • weights model whether neurons excite or inhibit each other
    • activation function and bias model the threshold behavior of neuron firing

Q & A

Image
XKCD