pyrdf2vec.walkers package¶

Submodules¶

Module contents¶

isort:skip_file

class pyrdf2vec.walkers.AnonymousWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8)¶

Bases: pyrdf2vec.walkers.random.RandomWalker

Anonymous walking strategy which transforms each vertex name other than the root node, into positional information, in order to anonymize the randomly extracted walks.

_is_support_remote¶: True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

kg¶: The global KG used later on for the worker process. Defaults to None.

max_depth¶: The maximum depth of one walk.

max_walks¶: The maximum number of walks per entity. Defaults to None.

random_state¶: The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler¶: The sampling strategy. Defaults to UniformSampler.

with_reverse¶: True to extracts parents and children hops from an entity, creating (max_walks * max_walks) more walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.

class pyrdf2vec.walkers.CommunityWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, hop_prob=0.1, md5_bytes=8, resolution=1)¶

Bases: pyrdf2vec.walkers.walker.Walker

Community walking strategy which groups vertices with similar properties through probabilities and relations that are not explicitly modeled in a Knowledge Graph. Similar to the Random walking strategy, the Depth First Search (DFS) algorithm is used if a maximum number of walks is specified. Otherwise, the Breadth First Search (BFS) algorithm is chosen.

_is_support_remote¶: True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise. Defaults to True.

hop_prob¶: The probability to hop. Defaults to 0.1.

kg¶: The global KG used later on for the worker process. Defaults to None.

max_depth¶: The maximum depth of one walk.

max_walks¶: The maximum number of walks per entity. Defaults to None.

md5_bytes¶: The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.

random_state¶: The random state to use to keep random determinism with the walking strategy. Defaults to None.

resolution¶: The resolution to use. Defaults to The resolution to use.

sampler¶: The sampling strategy. Defaults to UniformSampler.

with_reverse¶: True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.

extract(kg, entities, verbose=0)¶

Fits the provided sampling strategy and then calls the private _extract method that is implemented for each of the walking strategies.

Parameters

kg (KG) –
The Knowledge Graph.

The graph from which the neighborhoods are extracted for the provided entities.
entities (List[str]) – The entities to be extracted from the Knowledge Graph.
verbose (int) – The verbosity level. 0: does not display anything; 1: display of the progress of extraction and training of walks; 2: debugging. Defaults to 0.

Return type

List[List[Tuple[str, ...]]]

Returns

The 2D matrix with its number of rows equal to the number of provided entities; number of column equal to the embedding size.

extract_walks(kg, entity)¶

Extracts random walks of depth - 1 hops rooted in root.

Parameters

kg (KG) – The Knowledge Graph.
entity (Vertex) – The root node to extract walks.

Return type

List[Tuple[Any, ...]]

Returns

The list of unique walks for the provided entity.

class pyrdf2vec.walkers.HALKWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8, freq_thresholds=NOTHING)¶

Bases: pyrdf2vec.walkers.random.RandomWalker

HALK walking strategy which removes rare vertices from randomly extracted walks, increasing the quality of the generated embeddings while memory usage decreases.

_is_support_remote¶: True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

freq_thresholds¶: The minimum frequency thresholds of a (predicate, object) hop to be kept. Beware that the accumulation of several freq_thresholds extracts more walks, which is not always desirable. Defaults to [0.01].

kg¶: The global KG used later on for the worker process. Defaults to None.

max_depth¶: The maximum depth of one walk.

max_walks¶: The maximum number of walks per entity. Defaults to None.

md5_bytes¶: The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.

random_state¶: The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler¶: The sampling strategy. Defaults to UniformSampler.

with_reverse¶: True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.

build_dictionary(walks)¶

Builds a dictionary of predicates mapped with the walk(s) identifiers to which it appears.

Parameters: walks (List[Tuple[str, ...]]) – The walks to build the dictionary.
Return type: DefaultDict[str, Set[int]]
Returns: The dictionary of predicate names.

get_rare_predicates(vertex_to_windices, walks, freq_threshold)¶

Gets vertices which doesn’t reach a certain threshold of frequency of occurrence.

Parameters

vertex_to_windices (DefaultDict[str, Set[int]]) – The dictionary of predicates mapped with the walk(s) identifiers to which it appears.
walks (List[Tuple[str, ...]]) – The walks.
freq_threshold (float) – The threshold frequency of occurrence.

Return type

Set[str]

Returns

the infrequent vertices.

class pyrdf2vec.walkers.NGramWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8, grams=3, wildcards=None)¶

Bases: pyrdf2vec.walkers.random.RandomWalker

N-Gram walking strategy which relabels the n-grams in random walks to define a mapping from one-to-many. The intuition behind this is that the predecessors of a node that two different walks have in common can be different.

_is_support_remote¶: True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

_n_gram_map¶: Stores the mapping of N-gram. Defaults to {}.

grams¶: The N-gram to relabel. Defaults to 3.

kg¶: The global KG used later on for the worker process. Defaults to None.

max_depth¶: The maximum depth of one walk.

max_walks¶: The maximum number of walks per entity. Defaults to None.

random_state¶: The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler¶: The sampling strategy. Defaults to UniformSampler.

wildcards¶: The wildcards to be used to match sub-sequences with small differences to be mapped onto the same label. Defaults to None.

class pyrdf2vec.walkers.RandomWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8)¶

Bases: pyrdf2vec.walkers.walker.Walker

Random walking strategy which extracts walks from a root node using the Depth First Search (DFS) algorithm if a maximum number of walks is specified, otherwise the Breadth First Search (BFS) algorithm is used.

_is_support_remote¶: True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

kg¶: The global KG used later on for the worker process. Defaults to None.

max_depth¶: The maximum depth of one walk.

max_walks¶: The maximum number of walks per entity. Defaults to None.

md5_bytes¶: The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.

random_state¶: The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler¶: The sampling strategy. Defaults to UniformSampler.

with_reverse¶: True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.

extract_walks(kg, entity)¶

Extracts random walks for an entity based on Knowledge Graph using the Depth First Search (DFS) algorithm if a maximum number of walks is specified, otherwise the Breadth First Search (BFS) algorithm is used.

Parameters

kg (KG) – The Knowledge Graph.
entity (Vertex) – The root node to extract walks.

Return type

List[Tuple[Any, ...]]

Returns

The list of unique walks for the provided entity.

class pyrdf2vec.walkers.SplitWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8, func_split=None)¶

Bases: pyrdf2vec.walkers.random.RandomWalker

Splitting walking strategy which splits each vertex (except the root node) present in the randomly extracted walks.

_is_support_remote¶: True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

kg¶: The global KG used later on for the worker process. Defaults to None.

max_depth¶: The maximum depth of one walk.

max_walks¶: The maximum number of walks per entity. Defaults to None.

md5_bytes¶: The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.

random_state¶: The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler¶: The sampling strategy. Defaults to UniformSampler.

with_reverse¶: True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.

func_split¶: The function to call for the splitting of vertices. In case of reimplementation, it is important to respect the signature imposed by basic_split function. Defaults to func_split.

basic_split(walks)¶

Splits vertices of random walks for an entity based. To achieve this, each vertex (except the root node) is split according to symbols and capitalization by removing any duplication.

Some examples: (’http://dl-learner.org/carcinogenesis#d19’),

‘http://dl-learner.org/carcinogenesis#hasBond’), ‘http://dl-learner.org/carcinogenesis#bond3209’)

-> (’http://dl-learner.org/carcinogenesis#d19’, ‘has’, ‘bond’, ‘3209’)

(’http://dl-learner.org/carcinogenesis#d19’),: ‘http://www.w3.org/1999/02/22-rdf-syntax-ns#type’, ‘http://dl-learner.org/carcinogenesis#Compound’)

-> (’http://dl-learner.org/carcinogenesis#d19’, ‘type’, ‘compound’)

Parameters: walks (List[Tuple[Any, ...]]) – The random extracted walks.
Return type: Set[Tuple[str, ...]]
Returns: The list of tuples that contains split walks.

class pyrdf2vec.walkers.WLWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8, wl_iterations=4)¶

Bases: pyrdf2vec.walkers.random.RandomWalker

Weisfeiler-Lehman walking strategy which relabels the nodes of the extracted random walks, providing additional information about the entity representations only when a maximum number of walks is not specified.

_inv_label_map¶: Stores the mapping of the inverse labels. Defaults to defaultdict.

_is_support_remote¶: True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise. Defaults to False.

_label_map¶: Stores the mapping of the inverse labels. Defaults to defaultdict.

kg¶: The global KG used later on for the worker process. Defaults to None.

max_depth¶: The maximum depth of one walk.

max_walks¶: The maximum number of walks per entity. Defaults to None.

md5_bytes¶: The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.

random_state¶: The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler¶: The sampling strategy. Defaults to UniformSampler.

wl_iterations¶: The Weisfeiler Lehman’s iteration. Defaults to 4.

extract(kg, entities, verbose=0)¶

Fits the provided sampling strategy and then calls the private _extract method that is implemented for each of the walking strategies.

Parameters

kg (KG) – The Knowledge Graph.
entities (List[str]) – The entities to be extracted from the Knowledge Graph.
verbose (int) – The verbosity level. 0: does not display anything; 1: display of the progress of extraction and training of walks; 2: debugging. Defaults to 0.

Return type

List[List[Tuple[str, ...]]]

Returns

The 2D matrix with its number of rows equal to the number of provided entities; number of column equal to the embedding size.

class pyrdf2vec.walkers.Walker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None)¶

Bases: abc.ABC

Base class of the walking strategies.

_is_support_remote¶: True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

kg¶: The global KG used later on for the worker process. Defaults to None.

max_depth¶: The maximum depth of one walk.

max_walks¶: The maximum number of walks per entity. Defaults to None.

random_state¶: The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler¶: The sampling strategy. Defaults to UniformSampler.

with_reverse¶: True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. This doesn’t work with NGramWalker and WLWalker. Defaults to False.

extract(kg, entities, verbose=0)¶

Fits the provided sampling strategy and then calls the private _extract method that is implemented for each of the walking strategies.

Parameters

kg (KG) – The Knowledge Graph.
entities (List[str]) – The entities to be extracted from the Knowledge Graph.
verbose (int) – The verbosity level. 0: does not display anything; 1: display of the progress of extraction and training of walks; 2: debugging. Defaults to 0.

Return type

List[List[Tuple[str, ...]]]

Returns

The 2D matrix with its number of rows equal to the number of provided entities; number of column equal to the embedding size.

Raises

WalkerNotSupported – If there is an attempt to use an invalid walking strategy to a remote Knowledge Graph.

class pyrdf2vec.walkers.WalkletWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8)¶

Bases: pyrdf2vec.walkers.random.RandomWalker

Walklets walking strategy which transforms randomly extracted walks into walklets which are walks of size one or two including the root node and potentially another vertex that can be a predicate or an object.

_is_support_remote¶: True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

kg¶: The global KG used later on for the worker process. Defaults to None.

max_depth¶: The maximum depth of one walk.

max_walks¶: The maximum number of walks per entity. Defaults to None.

random_state¶: The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler¶: The sampling strategy. Defaults to UniformSampler.

with_reverse¶: True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.