Setup

Create a (virtual) environment with Python 3.9 or later and install or upgrade to the latest version of pygrank with:

pip install --upgrade pygrank

Creating graphs

When working n practical problems, use networkx to construct graphs by adding edges between Python objects. For example, you can construct a graph that pygrank can process with the following pattern, which we use throughout our documentation for ease of development:

import networkx as nx

graph = nx.Graph(directed=False)  # undirected is also the default
graph.add_edge('A', 'B')
graph.add_edge('A', 'C')

Graphs like the above require a lot of memory to keep track of relations between data, which can be an issue when processing large graphs. On the other hand, pygrank is typically interested in converting those graphs to sparse matrices of respective backends. For this reason, the library provides its own trimmed down pygrank.Graph class that implements a subset of of graph operations needed for node ranking algorithms and speeds up the add_edge method. Instances of this class can be created with the pattern:

import pygrank as pg

graph = pg.Graph(directed=False)  # undirected is also the default
graph.add_edge('A', 'B')
graph.add_edge('A', 'C')

Backends

Several popular computational backends are supported. To avoid bloat of the main package, these should be installed separately as needed. Only "numpy" can be used immediately out-of-the box as the default. Find instructions on how to install and enable the rest below.

Info

First-time users can stick to the default and skip the rest of this section. However, setting up graph analysis on GPU with respective backends can be hundreds of times faster.

To switch between backends, either use the load_backend(name) command or define an execution context that temporarily switches to the specified backend and then reverts to the previous one. This is the recommended approach, as demonstrated below. Switching backends only affects how new operations are executed. Data types are automatically converted as needed, and caching optimizations are tied to the backend.

import pygrank as pg

algorihtm = pg.PageRank()
with pg.Backend("tensorflow"):  # tensorflow needs to be installed
    scores = algorihtm(...)
    print(scores.np)  # a tensor
print(scores.np)  # an array now that we switched back

When importing pygrank a message appears indicating that "numpy" is the default backend. The same message points to a JSON configuration file stored under home/.pygrank, alongside any automatically downloaded content. The configuration file specifies the default backend to be set upon the library's first import, initialization parameters for that backend, and the option to silence the reminder message. These options can either be edited directly on the file or programmatically set with:

pg.set_backend_preference(name, reminder=True, **init)  # essentially call pg.load_backend(name, **init) on pygrank's first import

The init dictionary holds parameters passed to backend initialization. The configuration file's contents looks like this:

{
  "backend": "numpy", 
  "reminder": "true",
  "init": {}
}

Below is a list of supported backends with installation instructions and comments.

numpy

About
This is the default backend and is enabled by default. Internally, it employs scipy for sparse-dense matrix operations. All other backends rely on scipy sparse matrices as an intermediate step when initializing their own sparse matrix types. This backend is best suited to general-purpose numerical computations and handling very large graphs with memory efficiency, but is not the fastest option.
Links
numpy
scipy

tensorflow

About
Performs computations within the tensorflow execution environment. The latter is an open-source platform for machine learning developed by the Google Brain team. There are two modes in which this backend can be executed: "dense" (default) and "sparse". The mode may be provided as additional arguments to the backend loading call like this:

import pygrank as pg
with pg.Backend("tensorflow", mode="dense", device="auto"):
    ... # code to run on pytorch here

In dense mode, the tensorflow backend attempts to store graphs in dense square matrices that take full advantage of tensorflow's parallelization. If there is not enough memory to allocate a sparse adjacency matrix, the backend generates a sparse version and creates a warning. The backend's initialization also accepts a device string or object to which computations should be internally transferred. If provided, this needs to be a tensorflow device name.
Installation
pip install tensorflow[and-cuda]
On Windows install WSL2 (Windows Subsystem for Linux) first.
Links
tensorflow

pytorch

About
Performs computations within the pytorch execution environment. The latter is an open-source platform for machine learning developed by Meta's AI Research lab. Similarly to "tensorflow", are two modes in which this backend can be executed: "dense" (default) and "sparse". The mode may be provided as additional arguments to the backend loading call like this:

import pygrank as pg
with pg.Backend("pytorch", mode="dense", device="auto"):
    ... # code to run on pytorch

In dense mode, the pytorch backend attempts to store graphs in dense square matrices that take full advantage of pytorch's device parallelization. If there is not enough memory to allocate a sparse adjacency matrix, the backend generates a sparse version and creates a warning. The backend's initialization also accepts a device string or object to which computations should be internally transferred. If provided, this needs to be one among pytorch's available devices (typically "cuda" or "cpu"). If not provided, the device will be the same as the one selected during the last time this backend was loaded. If this is the first time, the device will be automatically selected to be "cuda" if the latter is properly integrated, and "cpu" otherwise.
Installation
For full installation instructions visit pytorch's website in the links below.
Links
pytorch

torch_sparse

About
Performs computations within the pytorch execution environment, but contrary to the "pytorch backend uses the sparse computations of the torch_sparse library. The latter is an open-source platform for machine learning developed by Meta's AI Research lab. This backend always executes on sparse mode and its initialization accepts a device string or object to which computations should be internally transferred. This follows the same conventions as "pytorch" to determine the employed device. For example, use this backend like this:

import pygrank as pg
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
with pg.Backend("torch_sparse", device=device): 
    ...  # code to run on torch_sparse 

Info

"torch_sparse" is near-identical as "pytorch" in sparse mode but is much faster in preprocessing adjacency matrices.

Installation
For full installation instructions visit pytorch's website in the links below.
Links
pytorch
torch_sparse

matvec

About
Offers multithreaded implementations and memory reuse that are much faster that "numpy" when processing extremely sparse graphs. It very fast when the number of edges is a small multiple of the number of nodes, but is slower than other backends for dense graphs.
Installation
pip install matvec
Links
matvec

dask

About
Offers the distributed computational model of dask.distributed. Enables distributed computing and parallel processing, making it ideal for very large graphs that need to be processed in a distributed manner. This backend's instantiation accepts additional positional and a keyword argument chunks=8 to denote the number of chunks to which sparse matrices are split (the maximum number of engaged distributed works), and keyword arguments to pass to the dask client's constructor.
Installation
pip install dask[distributed]
Links
dask.distributed

sparse_dot_mkl

About
Running computations on parallelized scipy multiplications. Provides speedups for sparse matrix multiplications by utilizing optimized MKL routines. Best suited when Intel's hardware and software stack are available.
Installation
pip install sparse_dot_mkl
Links
mkl

Info

If you use Intel's Python distribution, "sparse_dot_mkl" is only marginally faster than "numpy".