Autotuning

Beyond the ability to compare node ranking algorithms, we provide the ability to automatically tune node ranking algorithms or select the best ones with respect to optimizing a measure based on the graph and personalization at hand. This process is abstracted through a pygrank.Tuner base class, which wraps any kind of node ranking algorithm. Ideally, this would wrap end-product algorithms.

Info

Tuners differ from benchmarks in that they select node ranking algorithms on-the-fly based on the graph signal input.

Getting started

An exhaustive list of ready-to-use tuners can be found here. After initialization, these can run with the same pattern as other node ranking algorithms. Tuner instances with default arguments use common base settings. For example, the following code separates training and evaluation data for a provided personalization signal and then uses a tuner that by default creates a GenericGraphFilter instance with ten parameters.

import pygrank as pg

_, graph, group = pg.load_one("eucore")
signal = pg.to_signal(graph, group)

train, test = pg.split(signal, training_samples=0.5)

scores_pagerank = pg.PageRank(max_iters=1000)(train)
scores_tuned = pg.ParameterTuner()(train)

measure = pg.AUC(test, exclude=train)
pg.benchmark_print_line("Pagerank", measure(scores_pagerank))
pg.benchmark_print_line("Tuned", measure(scores_tuned))
# Pagerank           .83
# Tuned              .91

Instead of repeating the whole optimization process each time a tuner runs, you may want to tune once and use the created node ranking algorithm later. This can be achieved with the following pattern:

algorithm_tuned = pg.ParameterTuner().tune(training)
scores_tuned = algorithm_tuned(training)

Customization

Tune your algorithms by passing to the ParameterTuner a method (or lambda expression) that constructs them given a list of parameters. Also provide corresponding upper and lower bounds for the parameters. An example follows:

def custom_algorithm(params): 
    assert len(params) == 1
    return pg.PageRank(alpha=params[0])

algorithm = pg.ParameterTuner(custom_algorithm, 
                                     max_vals=[0.99], 
                                     min_vals=[0.5],
                                     measure=pg.NDCG)

The above snippet used NDCG as the measure of choice for tuning. If no measure is provided, AUC is the default. If the application calls for it and you want to create a measure that is tied to a specific graph signal with the as_supervised_method like below, set fraction_of_training=1 for the tuner. This forces the tuner to use the whole personalization to produce node ranks internally, since we perform the validation split a priori.

import pygrank as pg

_, graph, group = pg.load_one("eucore")
signal = pg.to_signal(graph, group)

train, test = pg.split(signal, training_samples=0.5)
train, valid = pg.split(train, training_samples=0.5)

tuner = pg.ParameterTuner(lambda params: pg.PageRank(alpha=params[0]),  # simpler than declaring a method
                             max_vals=[0.99],
                             min_vals=[0.5],
                             fraction_of_training=1,
                             measure=pg.NDCG(valid, exclude=train+test).as_supervised_method())

scores_pagerank = pg.PageRank(max_iters=1000)(train)
scores_tuned = tuner(train)

Optimizations

Graph convolutions are the most computationally-intensive operations node ranking algorithms employ, as their running time scales linearly with the number of network edges (instead of nodes). However, when tuners aim to optimize algorithms involving graph filters extending the ClosedFormGraphFilter class, graph filtering is decomposed into weighted sums of naturally occurring Krylov space base elements {Mnp, n=0,1,...}. To speed up computation time (by many times in some settings) pygrank provides the ability to save the generation of this Krylov space base so that future runs do not recompute it, effectively removing the need to perform graph convolutions all but once for each personalization.

Info

This speedup can be applied outside of tuners too; explicitly pass a graph signal object to node ranking algorithms.

To enable this behavior, a dictionary needs to be passed to closed form graph filter constructors through an optimization_dict argument. In most cases, tuners are responsible for delegating additional arguments to default algorithms and this can be achieved with the following code.

graph, personalization = ...
optimization_dict = dict()
tuner = pg.ParameterTuner(error_type="iters", 
                              num_parameters=20,
                              max_iters=20,
                              optimization_dict=optimization_dict)
scores = tuner(graph, personalization)

Warning

Similarly to the assume_immutability=True option for preprocessors, the optimization dictionary requires that graphs signals are not altered in the interim. Furthermore, it multiplies (e.g. at least doubles) the amount of used memory, which may be problematic for big graphs. To remove allocated memory (and allow signals to be edited), keep a reference to the dictionary and clear it afterwards with optimization_dict.clear().

Info

The default algorithms constructed by tuners (if none are provided) use pygrank.SelfClearDict instead of a normal dictionary. This clears other entries when a new personalization is inserted, therefore avoiding memory bloat.