pyrdf2vec.walkers package

Submodules

Module contents

isort:skip_file

class pyrdf2vec.walkers.AnonymousWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8)

Bases: pyrdf2vec.walkers.random.RandomWalker

Anonymous walking strategy which transforms each vertex name other than the root node, into positional information, in order to anonymize the randomly extracted walks.

_is_support_remote

True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

kg

The global KG used later on for the worker process. Defaults to None.

max_depth

The maximum depth of one walk.

max_walks

The maximum number of walks per entity. Defaults to None.

random_state

The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler

The sampling strategy. Defaults to UniformSampler.

with_reverse

True to extracts parents and children hops from an entity, creating (max_walks * max_walks) more walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.

class pyrdf2vec.walkers.CommunityWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, hop_prob=0.1, md5_bytes=8, resolution=1)

Bases: pyrdf2vec.walkers.walker.Walker

Community walking strategy which groups vertices with similar properties through probabilities and relations that are not explicitly modeled in a Knowledge Graph. Similar to the Random walking strategy, the Depth First Search (DFS) algorithm is used if a maximum number of walks is specified. Otherwise, the Breadth First Search (BFS) algorithm is chosen.

_is_support_remote

True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise. Defaults to True.

hop_prob

The probability to hop. Defaults to 0.1.

kg

The global KG used later on for the worker process. Defaults to None.

max_depth

The maximum depth of one walk.

max_walks

The maximum number of walks per entity. Defaults to None.

md5_bytes

The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.

random_state

The random state to use to keep random determinism with the walking strategy. Defaults to None.

resolution

The resolution to use. Defaults to The resolution to use.

sampler

The sampling strategy. Defaults to UniformSampler.

with_reverse

True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.

extract(kg, entities, verbose=0)

Fits the provided sampling strategy and then calls the private _extract method that is implemented for each of the walking strategies.

Parameters
  • kg (KG) –

    The Knowledge Graph.

    The graph from which the neighborhoods are extracted for the provided entities.

  • entities (List[str]) – The entities to be extracted from the Knowledge Graph.

  • verbose (int) – The verbosity level. 0: does not display anything; 1: display of the progress of extraction and training of walks; 2: debugging. Defaults to 0.

Return type

List[List[Tuple[str, ...]]]

Returns

The 2D matrix with its number of rows equal to the number of provided entities; number of column equal to the embedding size.

extract_walks(kg, entity)

Extracts random walks of depth - 1 hops rooted in root.

Parameters
  • kg (KG) – The Knowledge Graph.

  • entity (Vertex) – The root node to extract walks.

Return type

List[Tuple[Any, ...]]

Returns

The list of unique walks for the provided entity.

class pyrdf2vec.walkers.HALKWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8, freq_thresholds=NOTHING)

Bases: pyrdf2vec.walkers.random.RandomWalker

HALK walking strategy which removes rare vertices from randomly extracted walks, increasing the quality of the generated embeddings while memory usage decreases.

_is_support_remote

True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

freq_thresholds

The minimum frequency thresholds of a (predicate, object) hop to be kept. Beware that the accumulation of several freq_thresholds extracts more walks, which is not always desirable. Defaults to [0.01].

kg

The global KG used later on for the worker process. Defaults to None.

max_depth

The maximum depth of one walk.

max_walks

The maximum number of walks per entity. Defaults to None.

md5_bytes

The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.

random_state

The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler

The sampling strategy. Defaults to UniformSampler.

with_reverse

True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.

build_dictionary(walks)

Builds a dictionary of predicates mapped with the walk(s) identifiers to which it appears.

Parameters

walks (List[Tuple[str, ...]]) – The walks to build the dictionary.

Return type

DefaultDict[str, Set[int]]

Returns

The dictionary of predicate names.

get_rare_predicates(vertex_to_windices, walks, freq_threshold)

Gets vertices which doesn’t reach a certain threshold of frequency of occurrence.

Parameters
  • vertex_to_windices (DefaultDict[str, Set[int]]) – The dictionary of predicates mapped with the walk(s) identifiers to which it appears.

  • walks (List[Tuple[str, ...]]) – The walks.

  • freq_threshold (float) – The threshold frequency of occurrence.

Return type

Set[str]

Returns

the infrequent vertices.

class pyrdf2vec.walkers.NGramWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8, grams=3, wildcards=None)

Bases: pyrdf2vec.walkers.random.RandomWalker

N-Gram walking strategy which relabels the n-grams in random walks to define a mapping from one-to-many. The intuition behind this is that the predecessors of a node that two different walks have in common can be different.

_is_support_remote

True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

_n_gram_map

Stores the mapping of N-gram. Defaults to {}.

grams

The N-gram to relabel. Defaults to 3.

kg

The global KG used later on for the worker process. Defaults to None.

max_depth

The maximum depth of one walk.

max_walks

The maximum number of walks per entity. Defaults to None.

random_state

The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler

The sampling strategy. Defaults to UniformSampler.

wildcards

The wildcards to be used to match sub-sequences with small differences to be mapped onto the same label. Defaults to None.

class pyrdf2vec.walkers.RandomWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8)

Bases: pyrdf2vec.walkers.walker.Walker

Random walking strategy which extracts walks from a root node using the Depth First Search (DFS) algorithm if a maximum number of walks is specified, otherwise the Breadth First Search (BFS) algorithm is used.

_is_support_remote

True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

kg

The global KG used later on for the worker process. Defaults to None.

max_depth

The maximum depth of one walk.

max_walks

The maximum number of walks per entity. Defaults to None.

md5_bytes

The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.

random_state

The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler

The sampling strategy. Defaults to UniformSampler.

with_reverse

True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.

extract_walks(kg, entity)

Extracts random walks for an entity based on Knowledge Graph using the Depth First Search (DFS) algorithm if a maximum number of walks is specified, otherwise the Breadth First Search (BFS) algorithm is used.

Parameters
  • kg (KG) – The Knowledge Graph.

  • entity (Vertex) – The root node to extract walks.

Return type

List[Tuple[Any, ...]]

Returns

The list of unique walks for the provided entity.

class pyrdf2vec.walkers.SplitWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8, func_split=None)

Bases: pyrdf2vec.walkers.random.RandomWalker

Splitting walking strategy which splits each vertex (except the root node) present in the randomly extracted walks.

_is_support_remote

True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

kg

The global KG used later on for the worker process. Defaults to None.

max_depth

The maximum depth of one walk.

max_walks

The maximum number of walks per entity. Defaults to None.

md5_bytes

The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.

random_state

The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler

The sampling strategy. Defaults to UniformSampler.

with_reverse

True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.

func_split

The function to call for the splitting of vertices. In case of reimplementation, it is important to respect the signature imposed by basic_split function. Defaults to func_split.

basic_split(walks)

Splits vertices of random walks for an entity based. To achieve this, each vertex (except the root node) is split according to symbols and capitalization by removing any duplication.

Some examples: (’http://dl-learner.org/carcinogenesis#d19’),

-> (’http://dl-learner.org/carcinogenesis#d19’, ‘has’, ‘bond’, ‘3209’)

(’http://dl-learner.org/carcinogenesis#d19’),

http://www.w3.org/1999/02/22-rdf-syntax-ns#type’, ‘http://dl-learner.org/carcinogenesis#Compound’)

-> (’http://dl-learner.org/carcinogenesis#d19’, ‘type’, ‘compound’)

Parameters

walks (List[Tuple[Any, ...]]) – The random extracted walks.

Return type

Set[Tuple[str, ...]]

Returns

The list of tuples that contains split walks.

class pyrdf2vec.walkers.WLWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8, wl_iterations=4)

Bases: pyrdf2vec.walkers.random.RandomWalker

Weisfeiler-Lehman walking strategy which relabels the nodes of the extracted random walks, providing additional information about the entity representations only when a maximum number of walks is not specified.

_inv_label_map

Stores the mapping of the inverse labels. Defaults to defaultdict.

_is_support_remote

True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise. Defaults to False.

_label_map

Stores the mapping of the inverse labels. Defaults to defaultdict.

kg

The global KG used later on for the worker process. Defaults to None.

max_depth

The maximum depth of one walk.

max_walks

The maximum number of walks per entity. Defaults to None.

md5_bytes

The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.

random_state

The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler

The sampling strategy. Defaults to UniformSampler.

wl_iterations

The Weisfeiler Lehman’s iteration. Defaults to 4.

extract(kg, entities, verbose=0)

Fits the provided sampling strategy and then calls the private _extract method that is implemented for each of the walking strategies.

Parameters
  • kg (KG) – The Knowledge Graph.

  • entities (List[str]) – The entities to be extracted from the Knowledge Graph.

  • verbose (int) – The verbosity level. 0: does not display anything; 1: display of the progress of extraction and training of walks; 2: debugging. Defaults to 0.

Return type

List[List[Tuple[str, ...]]]

Returns

The 2D matrix with its number of rows equal to the number of provided entities; number of column equal to the embedding size.

class pyrdf2vec.walkers.Walker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None)

Bases: abc.ABC

Base class of the walking strategies.

_is_support_remote

True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

kg

The global KG used later on for the worker process. Defaults to None.

max_depth

The maximum depth of one walk.

max_walks

The maximum number of walks per entity. Defaults to None.

random_state

The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler

The sampling strategy. Defaults to UniformSampler.

with_reverse

True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. This doesn’t work with NGramWalker and WLWalker. Defaults to False.

extract(kg, entities, verbose=0)

Fits the provided sampling strategy and then calls the private _extract method that is implemented for each of the walking strategies.

Parameters
  • kg (KG) – The Knowledge Graph.

  • entities (List[str]) – The entities to be extracted from the Knowledge Graph.

  • verbose (int) – The verbosity level. 0: does not display anything; 1: display of the progress of extraction and training of walks; 2: debugging. Defaults to 0.

Return type

List[List[Tuple[str, ...]]]

Returns

The 2D matrix with its number of rows equal to the number of provided entities; number of column equal to the embedding size.

Raises

WalkerNotSupported – If there is an attempt to use an invalid walking strategy to a remote Knowledge Graph.

class pyrdf2vec.walkers.WalkletWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8)

Bases: pyrdf2vec.walkers.random.RandomWalker

Walklets walking strategy which transforms randomly extracted walks into walklets which are walks of size one or two including the root node and potentially another vertex that can be a predicate or an object.

_is_support_remote

True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

kg

The global KG used later on for the worker process. Defaults to None.

max_depth

The maximum depth of one walk.

max_walks

The maximum number of walks per entity. Defaults to None.

random_state

The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler

The sampling strategy. Defaults to UniformSampler.

with_reverse

True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.