pyrdf2vec.samplers.pagerank module

class pyrdf2vec.samplers.pagerank.PageRankSampler(inverse=False, split=False, *, alpha=0.85)

Bases: pyrdf2vec.samplers.sampler.Sampler

PageRank node-centric sampling strategy which prioritizes walks containing the most frequent objects. This frequency being defined by assigning a higher weight to the most frequent objects using the PageRank ranking.

_is_support_remote

True if the sampling strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to False.

_pageranks

The PageRank dictionary. Defaults to {}.

_random_state

The random state to use to keep random determinism with the sampling strategy. Defaults to None.

_vertices_deg

The degree of the vertices. Defaults to {}.

_visited

Tags vertices that appear at the max depth or of which all their children are tagged. Defaults to set.

alpha

The damping for PageRank. Defaults to 0.85.

inverse

True if the inverse algorithm must be used, False otherwise. Defaults to False.

split

True if the split algorithm must be used, False otherwise. Defaults to False.

fit(kg)

Fits the sampling strategy by running PageRank on a provided KG according to the specified damping.

Parameters

kg (KG) – The Knowledge Graph.

Return type

None

get_weight(hop)

Gets the weight of a hop in the Knowledge Graph.

Parameters

hop (Tuple[Any, Any]) – The hop of a vertex in a (predicate, object) form to get the weight.

Return type

float

Returns

The weight of a given hop.

Raises

ValueError – If there is an attempt to access the weight of a hop without the sampling strategy having been trained.