Classification

To support Graph Neural Network architectures, pygrank provides a mechanism for propagating latent representations through graph filters. This takes as input backend primitives organized into matrices, applies the graph filter on each of their columns and merges the results into matrices of the same directions.

For example, a predict-then-propagate architecture [klicpera2018predict] can be implemented in tensorflow with the following code:

import tensorflow as tf
import pygrank as pg
from tensorflow.keras.layers import Dropout, Dense

pg.load_backend('tensorflow')  # forces propagation through the tensorflow pipeline

class APPNP(tf.keras.Sequential):
    def __init__(self, num_inputs, num_outputs, hidden=64):
        super().__init__([
            Dropout(0.5, input_shape=(num_inputs,)),
            Dense(hidden, activation="relu", kernel_regularizer=tf.keras.regularizers.L2(1.E-5)),
            Dropout(0.5),
            Dense(num_outputs, activation="relu") ])
        self.ranker = pg.PageRank(0.9, renormalize=True, assume_immutability=True,
            error_type="iters", max_iters=10) # force 10 iterations
        self.input_spec = None  # prevents some versions of tensorflow from checking call inputs

    def call(self, inputs, training=False):
        graph, features = inputs
        predict = super().call(features, training=training)
        propagate = self.ranker.propagate(graph, predict, graph_dropout=0.5 if training else 0)
        return tf.nn.softmax(propagate, axis=1)

In the above code, the propagate method of the graph filter is used to perform the propagation. Additionally, we make use of the package's ability to perform renormalization of the adjacency matrix and graph dropout in forward computations.

Since organizing GNN training can prove cumbersome for non-experts, pygrank integrates helper methods to train GNNs for node classification. These are abstracted, so that calls are the same, regardless of the employed backend. Layer losses are parsed through layer regularizers for tensorflow or by setting a loss value to the trainable module in pytorch. Training makes use of the same arguments, albeit if a custom optimizer is provided this needs to be generated by the respective backend.

Given the previous implementation, in the next snippet we load a dataset with node features and labels, separate nodes into a 60-20-20 training-validation-test split and train the APPNP architecture with the helper methods:

graph, features, labels = pg.load_feature_dataset('citeseer')
training, test = pg.split(list(range(len(graph))), 0.8)
training, validation = pg.split(training, 1-0.2/0.8)
pg.load_backend('tensorflow')  # explicitly load the appropriate backend
model = APPNP(features.shape[1], labels.shape[1])
pg.gnn_train(model, graph, features, labels, training, validation,
             optimizer = tf.optimizers.Adam(learning_rate=0.01))

Predictions are easy to make per traditional calls and pygrank provides implementation of GNN accuracy, which is switched around based on the backend:

predictions = model([graph, features])
print("Accuracy", pg.gnn_accuracy(labels, predictions, test))