pyrdf2vec.walkers.split module

class pyrdf2vec.walkers.split.SplitWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8, func_split=None)

Bases: pyrdf2vec.walkers.random.RandomWalker

Splitting walking strategy which splits each vertex (except the root node) present in the randomly extracted walks.

_is_support_remote

True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

kg

The global KG used later on for the worker process. Defaults to None.

max_depth

The maximum depth of one walk.

max_walks

The maximum number of walks per entity. Defaults to None.

md5_bytes

The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.

random_state

The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler

The sampling strategy. Defaults to UniformSampler.

with_reverse

True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.

func_split

The function to call for the splitting of vertices. In case of reimplementation, it is important to respect the signature imposed by basic_split function. Defaults to func_split.

basic_split(walks)

Splits vertices of random walks for an entity based. To achieve this, each vertex (except the root node) is split according to symbols and capitalization by removing any duplication.

Some examples: (’http://dl-learner.org/carcinogenesis#d19’),

-> (’http://dl-learner.org/carcinogenesis#d19’, ‘has’, ‘bond’, ‘3209’)

(’http://dl-learner.org/carcinogenesis#d19’),

http://www.w3.org/1999/02/22-rdf-syntax-ns#type’, ‘http://dl-learner.org/carcinogenesis#Compound’)

-> (’http://dl-learner.org/carcinogenesis#d19’, ‘type’, ‘compound’)

Parameters

walks (List[Tuple[Any, ...]]) – The random extracted walks.

Return type

Set[Tuple[str, ...]]

Returns

The list of tuples that contains split walks.