What does it mean to have a graph fully connected?
It becomes less useful to talk about graphs as each data point is connected to all other points. There is no particular graph structure that can be used.
It would be better to talk about sets rather than graphs in this case.
Transformers are Set Neural Networks.
They are today the best technique to analyze sets/bags of features.
Changing the eigenvalues in Λ expresses any operation that commutes with L.
Commonly referred to as the "graph Fourier transform" (Bruna et al., ICLR'14)
To convolve with a feature matrix X, we do the following (learn λi):
f(X)=Φ[ˆλ0⋱ˆλn−1]Φ∗X
Spectral GNNs
Directly learning eigenvalues is typically inappropriate:
Not localized, doesn't generalize to other graphs, computationally expensive, etc.
Common solution: make eigenvalues Λ related to eigenvalues of L
Typically by a degree-k polynomial function, pk
Yielding f(x)=Φpk(Λ)Φ∗x
Common variants:
Cubic splines (Bruna et al., ICLR'14)
Chebyshev polynomials (Defferrard et al., NeurIPS'16)
Cayley polynomials (Levie et al., Trans. Sig. Proc.'18)
Using a polynomial in L we can define a conv-GNN
With coefficients defined by cij=(pk(L))ij
Transformers with Positional Encodeing for Graphs
Transformers are typically defined on sets.
Order information can be provided to transformers through positional embeddings
Sines/cosines sampled depending on position
PEpos,2i=sin(pos100002idemb) and PEpos,2i+1=cos(pos100002idemb)
Very similar to the DFT eigenvectors.
Positional embeddings can be interpreted as eigenvectors of the grid graph.
Same idea can be leveraged to define Transformers on Graphs
Just feed some eigenvectors of the graph Laplacian (columns of Φ)
Graph Transformer from Dwivedi & Bresson (2021)
Probabilistic Graphical Models
Probabilistic Modeling
Treat graphs as probabilistic graphical models (PGMs)
Nodes: Random variables
Edges: dependencies between distributions
Encode independence and conditional independence
allows for compact representation of multivariate probabilistic distributions
p(a,b,c,d,e)=p(a)p(b)p(c|a,b)p(d|e)p(c|e)
Markov Random Fields
A particular PGM of interest is Markov Random Feild (MRF)
allows decompositio of joint distribution into product of edge potentials
MRFs are modelled through inputs X and latents H.
latents are akin to node representations
inputs and latents for each node are are related to each other independently
latents are related accordingly to the edges of the graph
Joint probability distribution:
p(X,H)=∏v∈Vϕ(xv,hv)∏(v1,v2)∈Eψ(hv1,hv2)
ϕ(⋅,⋅) and ψ(⋅,⋅) are real-valued potential functions
Mean-Field Inference
To embed nodes, we need to sample from the posterior, p(H|X)
estimating the posterior is intractable, even if exact potential functions are known
One popular method of resolving this is mean-field variational inference
Assume posterior is approximated by a product of node-level densities
p(H|X)≈∏v∈Vq(hv)
q(⋅) is a density that is easy to compute and sample from, e.g., Gaussian
q(⋅) can be estimated by minimizing KL(∏vq(v|X)‖p(H|X)) to true posterior
Minimising the KL is intractable, but it admits a favourable approximate algorithm
Mean-Field Inference as GNN
We can iteratively update q(⋅) starting from an initial guess q(0)(h) (variational inference):
logq(t+1)(hi)=ci+logϕ(xi,hi)+∑j∈Ni∫Rkqt(hi)logψ(hi,hj)dhi
Aligns nicely with message-passing GNN.
hi=ϕ(xi,⨁j∈Niψ(xi,xj))
Graph Isomorphism Testing
How powerful are Graph Neural Networks?
GNNs are a powerful tool for processing real-world graph data
But they won't solve any task specified on a graph accurately!
Canonical example: deciding graph isomorphism
Can a GNN distinguish between two non-isomorphic graphs i.e., hG1≠hG2?
Permutation invariance mandates that two isomorphic graphs will always be indistinguishable, so this case is OK.
Weisfeiler-Leman Test
Simple but powerful way of distinguishing: pass random hashes of sums along the edges
Connection to conv-GNNs spotted very early; e.g., by GCN (Kipf & Welling, ICLR'17)
It explains why untrained GNNs work well!
Untrained ∼ random hash
The test can fail sometimes
GNNs are no more powerful than 1-WL
Over discrete features, GNNs can only be as powerful as the 1-WL test described before
One important condition for maximal power is an injective aggregator (e.g. sum)
Graph isomorphism network (GIN; Xu et al., ICLR'19) proposes a simple, maximally-expressive GNN, following this principle
h(k)v=MLP(k)((1+ϵ(k))⋅h(k−1)v+∑v∈N(v)h(k−1)v)
Higher-Order Graph Neural Networks
We can make GNNs stronger by analysing failure cases of 1-WL.
For example, just like 1-WL, GNNs cannot detect closed triangles
Augment nodes with randomised/positional features (Murphy et al., ICML'19 and You et al., ICML'19)
Can also literally count interesting subgraphs (Bouritsas et al., 2020)
k-WL labels subgraphs of k nodes together.
Exploited by 1-2-3-GNNs (Morris et al., AAAI'19)
If features are real-valued, injectivity of sum falls apart.
No aggregator is better than another aggregator.
Geometric Deep Learning
Geometric Deep Learning
GNNs were described in terms of invariances and equivariances.
Many standard architectures like CNNs and Transformers can be recoverd by,
Local and equivariant layer specified over neighbourhoods
Activation functions
(Potentially: pooling layers that coarsen the structure)
Global and invariant layer over the entire domain
Can design a more general class of geometric deep learning architectures.
"Five Gs" of Geometric Deep Learning
Applications
Quantum Chemistry
Slide Credit: Xavier Bresson
Computer Vision
Slide Credit: Xavier Bresson
Combinatorial Optimization
Slide Credit: Xavier Bresson
Semi-Supervised Classification on Graphs
Setting:
Some nodes are labeled
All other nodes are unlabeled
Task:
Predict node label of unlabeled nodes
Evaluate loss on labeled nodes only:
L=−∑l∈YLF∑f=1YlflogZlf
YL: set of labled node indices
Y: label matrix
Z: GCN output (after softmax)
Normalization and Depth
What kind of normalization can we do for graphs?
Node normalization
Pair normalization
Edge normalization
Does depth matter for GCNs?
Scalable Inception Graph Neural Networks
Figure Credit: Michael Bronstein
Scalable Inception Graph Neural Networks
Graph CNN Challenges
How to define compositionality on graphs (i.e. convolution, downsampling, and pooling on graphs)?
How to ensure generalizability across graphs?
And how to make them numerically fast? (as standard CNNs)
Scalability: millions of node, billions of edges
How to process dynamic graphs?
How to incorporate higher-order structures?
How powerful are graph neural networks?
limited theoretical understanding
message passing based GCN are no stronger than Weisfeiler-Lehman tests.
Getting Started
Lots of open questions and uncharted territory.
Standardized Benchmarks: Open Graph Benchmark
Software Packages: Pytorch Geometric or Deep Graph Library