Evaluation

pygrank provides a wide breadth of measures that can assess node ranking algorithms, as well as comprehensive code interfaces with which to quickly set up experiments at scale to compare algorithms. An offshoot of these capabilities is autotuning, which is very useful in practice and covered here.

Measures

Provided measures are mainly a) supervised in that they compare graph signal posteriors (the node scores outputted by node ranking algorithms) with some known ground truth ones, or b) unsupervised in that they assess whether posteriors satisfy a desired property, such as low conductance or density. These two types of measures are instantiated with keyword parameters. A list of measures can be found here. We have not addressed some types yet; look for measures that extend the Supervised and Unsupervised base classes. After initialization, measures are callables that output a numeric value given graph signal posteriors.

Info

Supervised measures typically admit two arguments:

known_scores: This is a mandatory first argument that holds a graph signal or data that together with a graph can automatically instantiate a signal. It indicates ground truth information considered as the desired posteriors.
exclude: Optional. A binary graph signal (or, again, data to extrapolate a graph signal) that indicates which nodes to exclude from evaluation. Typically, you will want to exclude all the nodes with non-zero values in the personalization, either with by providing the personalization signal or the list of nodes generating it. If None (default) all nodes are included in the evaluation. You can overwrite this value at any point with measure.exclude = ....

Unsupervised measures may also take an optional graph argument. This helps convert provided data to a graph signal internally when calling the measure with not a signal.

An example of how to use measures follows. In that we also used Threshold postprocessor to convert scores to binary values, but measures also accept non-crisp scores that have intermediate values; in this case stochastic equivalents are computed.

import pygrank as pg
import networkx as nx

graph = nx.Graph()
graph.add_edge('A', 'B')
graph.add_edge('A', 'C')
graph.add_edge('C', 'D')
graph.add_edge('D', 'E')
signal = pg.to_signal(graph, {'A': 3, 'C': 2})

algorithm = pg.HeatKernel(t=5) >> pg.Normalize("max") >> pg.Threshold(0.5)
scores = algorithm(signal)
print(scores)  # {'A': 1.0, 'B': 1.0, 'C': 1.0, 'D': 1.0, 'E': 0.0}

supervised = pg.Accuracy({'D': 1}, exclude=signal)  #  or exclude=['A', 'C']
print(supervised(scores))  # 0.6666666666666667

unsupervised = pg.Density()  # lower is better
print(unsupervised(scores))  # 0.5

In addition to base measures, there also exist c) variants that combine them, and d) evaluation that runs on multiple communities. To begin with, multiple measures can be aggregated through the pg.AM and pg.GM classes, which respectively perform arithmetic and geometric averaging of assessment values. After instantiateing these classes, measures can be added to them with a chain of .add method calls that also accept optional weights and saturation thresholds. Here is an example:

import pygrank as pg

known_scores, algorithm, personalization, sensitivity_scores = ...
auc = pg.AUC(known_scores, exclude=personalization)
prule = pg.pRule(sensitivity_scores, exclude=personalization)
measure = pg.AM(differentiable=True).add(auc).add(prule, weight=10., max_val=0.8)

Info

When working on autodifferntiation backends, the constructor of the measure combination mechanism can be annotated with differentiable=True. This indicates that, when thresholds are provided, a differentiable relaxation of saturation should be applied instead of a simple min or max.

Extension of base measures to multiple communities can be obtained from the pg.MultiSupervised and pg.MultiUnsupervised classes. These take as arguments the classes of base measures and any keyword arguments needed to construct them. In case of the exclude attribute of supervised measures a dictionary from community identifiers (like strings with community names) to graph signals/signal data should be provided to the pg.MultiSupervised. When called, these measures process and output dictionaries from community identifiers to create an assessment value for each of those communities. Values are also returned as a similar dictionary. Practical code using these classes may look like this:

import pygrank as pg

_, graph, communities = next(pg.load_datasets_multiple_communities(["dblp"]))
algorithm = pg.PageRank(alpha=0.9, assume_immutability=True)  # cache graph preprocessing
algorithm = algorithm >> pg.Normalize("max")
comm_scores = {name: algorithm(graph, members) for name, members in communities.items()}

measure = pg.MultiUnsupervised(pg.Conductance)
print(measure(comm_scores))

In case of unsupervised evaluation you may also be interested in computing link-based unsupervised measures. This is done with the pg.LinkAssessmen class; this needs the graph as a constructor argument and can only be computed when multiple communities are analysed together. However, it often performs a qualitative evaluation closer to supervised assessment [krasanakis2020unsupervised]. Continuing from the previous snippet, evaluation of scores with this strategy can be performed like so:

import tqdm  # install this to be able to set it as a progress bar argument below

measure = pg.LinkAssessment(graph, progress=tqdm.tqdm)  # graph argument is mandatory
print(measure(comm_scores))

Info

Link-based evaluation or, in general, some kind of unsupervised evaluation is needed when there are only a few community members that serve as examples towards which to score proximity. In this case, splitting example members to training and test sets risks degrading supervised evaluation quality.

Datasets

pygrank provides a variety of datasets to be automatically downloaded and imported. Messages in the error console point to respective dataset sources. You can also add your own datasets by placing them in a visible directory (a data directory either under the project or under home/.pygrank/data where automatically downloaded datasets are stored). Please refer to the repository of datasets generated for pygrank for a description of conventions needed to create new datasets; broadly, these follow the format of the SNAP repository and include pairs.txt and groups.txt file.

A summary of all datasets which are automatically downloaded by the project and their available types of data can be found here. A lists of all dataset names can be obtained programmatically with the method downloadable_small_datasets(), but you can also use downloadable_small_datasets() to get a list of dataset names on datasets that run fastly (within seconds instead of minutes or tens of minutes) in modern machines. Lists of dataset names like these are provided to methods that iteratively yield loaded data. Five methods with the ability to load different amounts of information are provided below. For each method, datasets lacking needed information are silently omitted.

Method	Description
`load_one`	Gets as an input one dataset name and returns a tuple of that name, the graph, and the first group found by `load_datasets_one_community`.
`load_datasets_graph`	Yields respective dataset graphs.
`load_datasets_one_community`	Yields tuples of dataset names, graphs and node lists, where the lists correspond to one of the (structural or metadata) communities of graph nodes.
`load_datasets_all_communities`	Yields the same tuples as before, but also traverses all possible datasets node communities (thus, there are many loading outcomes for each dataset). Community identifiers are appended to dataset names.
`load_datasets_multiple_communities`	Yields tuples of dataset names, graphs and hashmaps, where the latter map community identifiers to lists of nodes.
`load_feature_dataset`	Yields respective tuples of graphs, features and node labels, where the last two are organized into numpy arrays whose lines correspond to graph nodes (based on the order the graphs are traversed).

All dataset loading methods admit the following list of keyword parameters:

path: This is directory location (default is "data") to seach for datasets and download them in. Take care to not add a trailing slash.
min_group_size: The minimum number of nodes for each group to accept (default is 0.01). Groups with fewer nodes are ignored. If a value less than 1 is provided, it is multiplied with the number of graph nodes.
min_group_id: The
max_group_number: The maximum number of communities to load; stops at this number. Default is 20.
prepend_all_nodes: If True, creates a first community that holds all graph nodes. Default is False. graph_api=nx,

The following code can be used to load datasets for overlapping community detection given that each node community should be experimented on separately.

import pygrank as pg

datasets = pg.downloadable_small_datasets()
print(datasets)  # ['bigraph', 'blockmodel', 'citeseer', 'eucore', 'graph5', 'graph9']

for dataset, graph, group in pg.load_datasets_all_communities(datasets):
    print(dataset, ":", len(group), "community members", len(graph), "nodes",  graph.number_of_edges(), "edges")

# REQUIRED CITATION: Please visit the url https://github.com/maniospas/pygrank-datasets for instructions on how to cite the dataset bigraph in your research
# bigraph0 : 300 community members 600 nodes 27092.0 edges
# bigraph1 : 300 community members 600 nodes 27092.0 edges
# ...

Benchmarks

pygrank offers the ability to conduct benchmark experiments that compare many node ranking algorithms and parameters on a wide range of graphs. A simple way to obtain some fastly-running algorithms and small datasets and compare them under the AUC measure would be per:

import pygrank as pg

dataset_names = pg.downloadable_small_datasets()
print(dataset_names)  # ['citeseer', 'eucore']

algorithms = pg.create_demo_filters()
print(algorithms.keys())  # dict_keys(['PPR.85', 'PPR.9', 'PPR.99', 'HK3', 'HK5', 'HK7'])

loader = pg.load_datasets_one_community(dataset_names)
pg.benchmark_print(pg.benchmark(algorithms, loader, pg.AUC))

#                    PPR.85      PPR.9       PPR.99      HK3     HK5     HK7 
# citeseer           .89         .89         .89         .89     .89     .89 
# eucore             .85         .71         .71         .91     .89     .83 
# graph9             1.00        1.00        1.00        1.00    1.00    1.00
# bigraph            .96         .77         .77         1.00    .98     .86 
# REQUIRED CITATION: Please visit the url https://linqs.soe.ucsc.edu/data for instructions on how to cite the dataset citeseer in your research
# REQUIRED CITATION: Please visit the url https://snap.stanford.edu/data/email-Eu-core.html for instructions on how to cite the dataset eucore in your research
# REQUIRED CITATION: Please visit the url https://github.com/maniospas/pygrank-datasets for instructions on how to cite the dataset graph5 in your research
# REQUIRED CITATION: Please visit the url https://github.com/maniospas/pygrank-datasets for instructions on how to cite the dataset graph9 in your research
# REQUIRED CITATION: Please visit the url https://github.com/maniospas/pygrank-datasets for instructions on how to cite the dataset bigraph in your research

You can apply a postprocessor to a dictionary of node ranking algorithms like below. You can apply several postprocessors at once, for example if the end-goal is to compare between them. When apply prostprocessors an masse, a prefix is declared to prepend to algorithm names (e.g., the HK5Sweep variation of HK5 will be created).

sweeps = pg.create_variations(algorithms, {"Sweep": pg.Sweep})
algorithms = algorithms | sweeps  # merge old and new algorithms

For massive experimentation, generate many algorithms and variations with the following methods:

postprocessors = pg.create_many_variation_types()
filters = pg.create_many_filters()
algorithms = pg.create_variations(filters, postprocessors)

Specific algorithms can also be added or used in place of the algorithms dictionary. For example, add an automatically-tuned algorithm (more on these later) with default parameters per the following code and then re-run experiments to see how well it fares..

algorithms["Tuned"] = pg.ParameterTuner()

Warning

To run a new series of experiments, the loader needs to be called anew (it is an iterator).

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search