Link Prediction
1. Load nx graph
We are going to create a networkx graph. This cannot be of other
types, such as
pygrank.fastgraph because, they do not implement the remove_edge method.
Doing this with automatic loaders consists of providing the library
from which the graph generation object will be obtained.
import pygrank as pg
import networkx as nx
graph = next(pg.load_datasets_graph(["citeseer"], graph_api=nx))
2. Define predictor
We now define a link predictor on a graph for a node. For the actual predictions, we will get the top scored nodes to link to. We put last the already linked nodes, and eventually keep only the top ones.
def link_prediction(graph, node, top=5):
algorithm = pg.PageRank(0.95) >> pg.SeedOversampling() # or try pg.HeatKernel(5) >> pg.SeedOversampling("neighbors")
ranks = algorithm(graph, {node: 1})
return sorted(graph, key=lambda v: -ranks[v] if not graph.has_edge(node, v) else 0)[:top]
3. Evaluate
To evaluate the link prediction strategy we obtain a test subset of
each node's neighbors and then temporarily remove some of them with
remove_edge. Then, the recommendation takes place and precisions
and recalls are hand-computed. Finally, the removed edges are
re-insered before visiting the next node.
precisions = list()
recalls = list()
for node in graph:
if graph.degree(node) < 10:
continue
_, test = pg.split(list(graph.neighbors(node)))
test = set(test)
for v in test:
graph.remove_edge(node, v)
recommendation = link_prediction(graph, node)
TP = len([v for v in recommendation if v in test])
precisions.append(TP/len(recommendation))
recalls.append(TP/len(test))
for v in test:
graph.add_edge(node, v)
print("Avg. precision", sum(precisions)/len(precisions)) # 0.12631578947368416
print("Avg. recall", sum(recalls)/len(recalls)) # 0.22215956558061817