pyrdf2vec.samplers package¶

Submodules¶

Module contents¶

isort:skip_file

class pyrdf2vec.samplers.ObjFreqSampler(inverse=False, split=False)¶

Bases: pyrdf2vec.samplers.sampler.Sampler

Object Frequency Weight node-centric sampling strategy which prioritizes walks containing edges with the highest degree objects. The degree of an object being defined by the number of predicates present in its neighborhood.

Attributes:

_counts: The counter for vertices.
Defaults to defaultdict.

_is_support_remote: True if the sampling strategy can be used with a
remote Knowledge Graph, False Otherwise Defaults to False.

_random_state: The random state to use to keep random determinism with
the sampling strategy. Defaults to None.

_vertices_deg: The degree of the vertices.
Defaults to {}.

_visited: Tags vertices that appear at the max depth or of which all
their children are tagged. Defaults to set.

inverse: True if the inverse algorithm must be used, False otherwise.
Defaults to False.

split: True if the split algorithm must be used, False otherwise.
Defaults to False.

fit(kg)¶

Fits the sampling strategy by counting the number of parent predicates present in the neighborhood of each vertex.

Parameters: kg (KG) – The Knowledge Graph.
Return type: None

get_weight(hop)¶

Gets the weight of a hop in the Knowledge Graph.

Parameters: hop (Tuple[Any, Any]) – The hop of a vertex in a (predicate, object) form to get the weight.
Return type: int
Returns: The weight of a given hop.
Raises: ValueError – If there is an attempt to access the weight of a hop without the sampling strategy having been trained.

class pyrdf2vec.samplers.ObjPredFreqSampler(inverse=False, split=False)¶

Bases: pyrdf2vec.samplers.sampler.Sampler

Predicate-Object Frequency Weight edge-centric sampling strategy which prioritizes walks containing edges with the highest degree of (predicate, object) relations. The degree of a such relation being defined by the number of occurences that a (predicate, object) relation appears in a Knowledge Graph.

_counts¶

The counter for vertices. Defaults to defaultdict.

Type: DefaultDict[Tuple[str, str], int]

_is_support_remote¶: True if the sampling strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to False.

_random_state¶: The random state to use to keep random determinism with the sampling strategy. Defaults to None.

_vertices_deg¶: The degree of the vertices. Defaults to {}.

_visited¶: Tags vertices that appear at the max depth or of which all their children are tagged. Defaults to set.

inverse¶: True if the inverse algorithm must be used, False otherwise. Defaults to False.

split¶: True if the split algorithm must be used, False otherwise. Defaults to False.

fit(kg)¶

Fits the sampling strategy by counting the number of occurrences of an object belonging to a subject.

Parameters: kg (KG) – The Knowledge Graph.
Return type: None

get_weight(hop)¶

Gets the weight of a hop in the Knowledge Graph.

Parameters: hop (Tuple[Any, Any]) – The hop of a vertex in a (predicate, object) form to get the weight.
Return type: int
Returns: The weight of a given hop.
Raises: ValueError – If there is an attempt to access the weight of a hop without the sampling strategy having been trained.

class pyrdf2vec.samplers.PageRankSampler(inverse=False, split=False, *, alpha=0.85)¶

Bases: pyrdf2vec.samplers.sampler.Sampler

PageRank node-centric sampling strategy which prioritizes walks containing the most frequent objects. This frequency being defined by assigning a higher weight to the most frequent objects using the PageRank ranking.

_is_support_remote¶: True if the sampling strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to False.

_pageranks¶: The PageRank dictionary. Defaults to {}.

_random_state¶: The random state to use to keep random determinism with the sampling strategy. Defaults to None.

_vertices_deg¶: The degree of the vertices. Defaults to {}.

_visited¶: Tags vertices that appear at the max depth or of which all their children are tagged. Defaults to set.

alpha¶: The damping for PageRank. Defaults to 0.85.

inverse¶: True if the inverse algorithm must be used, False otherwise. Defaults to False.

split¶: True if the split algorithm must be used, False otherwise. Defaults to False.

fit(kg)¶

Fits the sampling strategy by running PageRank on a provided KG according to the specified damping.

Parameters: kg (KG) – The Knowledge Graph.
Return type: None

get_weight(hop)¶

Gets the weight of a hop in the Knowledge Graph.

Parameters: hop (Tuple[Any, Any]) – The hop of a vertex in a (predicate, object) form to get the weight.
Return type: float
Returns: The weight of a given hop.
Raises: ValueError – If there is an attempt to access the weight of a hop without the sampling strategy having been trained.

class pyrdf2vec.samplers.PredFreqSampler(inverse=False, split=False)¶

Bases: pyrdf2vec.samplers.sampler.Sampler

Predicate Frequency Weight edge-centric sampling strategy which prioritizes walks containing edges with the highest degree predicates. The degree of a predicate being defined by the number of occurences that a predicate appears in a Knowledge Graph.

_counts¶

The counter for vertices. Defaults to defaultdict.

Type: DefaultDict[str, int]

_is_support_remote¶: True if the sampling strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to False.

_random_state¶: The random state to use to keep random determinism with the sampling strategy. Defaults to None.

_vertices_deg¶: The degree of the vertices. Defaults to {}.

_visited¶: Tags vertices that appear at the max depth or of which all their children are tagged. Defaults to set.

inverse¶: True if the inverse algorithm must be used, False otherwise. Defaults to False.

split¶: True if the split algorithm must be used, False otherwise. Defaults to False.

fit(kg)¶

Fits the sampling strategy by counting the number of occurences that a predicate appears in the Knowledge Graph.

Parameters: kg (KG) – The Knowledge Graph.
Return type: None

get_weight(hop)¶

Gets the weight of a hop in the Knowledge Graph.

Parameters: hop (Tuple[Any, Any]) – The hop of a vertex in a (predicate, object) form to get the weight.
Return type: int
Returns: The weight of a given hop.
Raises: ValueError – If there is an attempt to access the weight of a hop without the sampling strategy having been trained.

class pyrdf2vec.samplers.Sampler(inverse=False, split=False)¶

Bases: abc.ABC

Base class of the sampling strategies.

_is_support_remote¶: True if the sampling strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to False.

_random_state¶: The random state to use to keep random determinism with the sampling strategy. Defaults to None.

_vertices_deg¶: The degree of the vertices. Defaults to {}.

_visited¶: Tags vertices that appear at the max depth or of which all their children are tagged. Defaults to set.

inverse¶: True if the inverse algorithm must be used, False otherwise. Defaults to False.

split¶: True if the split algorithm must be used, False otherwise. Defaults to False.

abstract fit(kg)¶

Fits the sampling strategy.

Parameters: kg (KG) – The Knowledge Graph.
Raises: SamplerNotSupported – If there is an attempt to use an invalid sampling strategy to a remote Knowledge Graph.
Return type: None

abstract get_weight(hop)¶

Gets the weight of a hop in the Knowledge Graph.

Parameters: hop (Tuple[Any, Any]) – The hop of a vertex in a (predicate, object) form to get the weight.
Returns: The weight of a given hop.
Raises: NotImplementedError – If this method is called, without having provided an implementation.

get_weights(hops)¶

Gets the weights of the provided hops.

Parameters: hops (List[Tuple[Any, Any]]) – The hops to get the weights.
Return type: Optional[List[float]]
Returns: The weights to the edge of the Knowledge Graph.

property random_state: Optional[int]¶

Gets the random state.

Return type: Optional[int]
Returns: The random state.

sample_hop(kg, walk, is_last_hop, is_reverse=False)¶

Samples an unvisited random hop in the (predicate, object) form, according to the weight of hops for a given walk.

Parameters

kg (KG) – The Knowledge Graph.
walk (Tuple[Any, ...]) – The walk with one or several vertices.
is_last_hop (bool) – True if the next hop to be visited is the last one for the desired depth, False otherwise.
is_reverse (bool) – True to get the parent neighbors instead of the child neighbors, False otherwise. Defaults to False.

Return type

Optional[Tuple[Any, Any]]

Returns

An unvisited hop in the (predicate, object) form.

property visited: Set[Tuple[Tuple[Any, Any], int]]¶

Gets the tagged vertices that appear at the max depth or of which all their children are tagged.

Return type: Set[Tuple[Tuple[Any, Any], int]]
Returns: The tagged vertices.

class pyrdf2vec.samplers.UniformSampler¶

Bases: pyrdf2vec.samplers.sampler.Sampler

Uniform sampling strategy that assigns a uniform weight to each edge in a Knowledge Graph, in order to prioritizes walks with strongly connected entities.

_is_support_remote¶

True if the sampling strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

Type: bool

_random_state¶: The random state to use to keep random determinism with the sampling strategy. Defaults to None.

_vertices_deg¶: The degree of the vertices. Defaults to {}.

_visited¶: Tags vertices that appear at the max depth or of which all their children are tagged. Defaults to set.

inverse¶: True if the inverse algorithm must be used, False otherwise. Defaults to False.

split¶: True if the split algorithm must be used, False otherwise. Defaults to False.

fit(kg)¶

Since the weights are uniform, this function does nothing.

Parameters: kg (KG) – The Knowledge Graph.
Return type: None

get_weight(hop)¶

Gets the weight of a hop in the Knowledge Graph.

Parameters: hop (Tuple[Any, Any]) – The hop of a vertex in a (predicate, object) form to get the weight.
Return type: int
Returns: The weight of a given hop.

class pyrdf2vec.samplers.WideSampler(inverse=False, split=False)¶

Bases: pyrdf2vec.samplers.sampler.Sampler

Wide sampling node-centric sampling strategy which gives priority to walks containing edges with the highest degree of predicates and objects. The degree of a predicate and an object being defined by the number of predicates and objects present in its neighborhood, but also by their number of occurrence in a Knowledge Graph.

_is_support_remote¶: True if the sampling strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to False.

_random_state¶: The random state to use to keep random determinism with the sampling strategy. Defaults to None.

_vertices_deg¶: The degree of the vertices. Defaults to {}.

_visited¶: Tags vertices that appear at the max depth or of which all their children are tagged. Defaults to set.

inverse¶: True if the inverse algorithm must be used, False otherwise. Defaults to False.

split¶: True if the split algorithm must be used, False otherwise. Defaults to False.

fit(kg)¶

Fits the sampling strategy by couting the number of available neighbors for each vertex, but also by counting the number of occurrence that a predicate and an object appears in the Knowledge Graph.

Parameters: kg (KG) – The Knowledge Graph.
Return type: None

get_weight(hop)¶

Gets the weight of a hop in the Knowledge Graph.

Parameters: hop (Tuple[Any, Any]) – The hop of a vertex in a (predicate, object) form to get the weight.
Return type: float
Returns: The weight of a given hop.
Raises: ValueError – If there is an attempt to access the weight of a hop without the sampling strategy having been trained.