Classification
To support Graph Neural Network architectures, pygrank provides a mechanism
for propagating latent representations through graph filters. This takes as
input backend primitives organized into matrices, applies the graph filter on
each of their columns and merges the results into matrices of the same
directions.
For example, a predict-then-propagate architecture [klicpera2018predict] can be implemented in tensorflow with the following code:
import tensorflow as tf
import pygrank as pg
from tensorflow.keras.layers import Dropout, Dense
pg.load_backend('tensorflow') # forces propagation through the tensorflow pipeline
class APPNP(tf.keras.Sequential):
def __init__(self, num_inputs, num_outputs, hidden=64):
super().__init__([
Dropout(0.5, input_shape=(num_inputs,)),
Dense(hidden, activation="relu", kernel_regularizer=tf.keras.regularizers.L2(1.E-5)),
Dropout(0.5),
Dense(num_outputs, activation="relu") ])
self.ranker = pg.PageRank(0.9, renormalize=True, assume_immutability=True,
error_type="iters", max_iters=10) # force 10 iterations
self.input_spec = None # prevents some versions of tensorflow from checking call inputs
def call(self, inputs, training=False):
graph, features = inputs
predict = super().call(features, training=training)
propagate = self.ranker.propagate(graph, predict, graph_dropout=0.5 if training else 0)
return tf.nn.softmax(propagate, axis=1)
In the above code, the propagate method of the graph filter is used to perform the propagation.
Additionally, we make use of the package's ability to perform renormalization of the
adjacency matrix and graph dropout in forward computations.
Since organizing GNN training can prove cumbersome for non-experts,
pygrank integrates helper methods to train GNNs for node classification.
These are abstracted, so that calls are the same, regardless of the employed
backend. Layer losses are parsed through layer regularizers for tensorflow
or by setting a loss value to the trainable module in pytorch. Training
makes use of the same arguments, albeit if a custom optimizer is provided this
needs to be generated by the respective backend.
Given the previous implementation, in the next snippet we load a dataset with node features and labels, separate nodes into a 60-20-20 training-validation-test split and train the APPNP architecture with the helper methods:
graph, features, labels = pg.load_feature_dataset('citeseer')
training, test = pg.split(list(range(len(graph))), 0.8)
training, validation = pg.split(training, 1-0.2/0.8)
pg.load_backend('tensorflow') # explicitly load the appropriate backend
model = APPNP(features.shape[1], labels.shape[1])
pg.gnn_train(model, graph, features, labels, training, validation,
optimizer = tf.optimizers.Adam(learning_rate=0.01))
Predictions are easy to make per traditional calls and pygrank provides
implementation of GNN accuracy, which is switched around based on the backend:
predictions = model([graph, features])
print("Accuracy", pg.gnn_accuracy(labels, predictions, test))