State Space Models
CSE 849: Deep Learning
Vishnu Boddeti
Layer Type | Training | Inference |
---|---|---|
Attention | $\mathcal{O}(n^2) - parallel$ | $\mathcal{O}(n^2) - sequential$ |
Recurrent | $\mathcal{O}(1) - sequential$ | $\mathcal{O}(1) - sequential$ |
h = 0
ylist = []
for i in range(sequence_length):
h = A @ h + B @ x[i]
y = C @ h
ylist.append(y)
One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great.