pyrdf2vec.walkers.split module¶
- class pyrdf2vec.walkers.split.SplitWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8, func_split=None)¶
Bases:
pyrdf2vec.walkers.random.RandomWalker
Splitting walking strategy which splits each vertex (except the root node) present in the randomly extracted walks.
- _is_support_remote¶
True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.
- kg¶
The global KG used later on for the worker process. Defaults to None.
- max_depth¶
The maximum depth of one walk.
- max_walks¶
The maximum number of walks per entity. Defaults to None.
- md5_bytes¶
The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.
- random_state¶
The random state to use to keep random determinism with the walking strategy. Defaults to None.
- sampler¶
The sampling strategy. Defaults to UniformSampler.
- with_reverse¶
True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.
- func_split¶
The function to call for the splitting of vertices. In case of reimplementation, it is important to respect the signature imposed by basic_split function. Defaults to func_split.
- basic_split(walks)¶
Splits vertices of random walks for an entity based. To achieve this, each vertex (except the root node) is split according to symbols and capitalization by removing any duplication.
Some examples: (’http://dl-learner.org/carcinogenesis#d19’),
-> (’http://dl-learner.org/carcinogenesis#d19’, ‘has’, ‘bond’, ‘3209’)
- (’http://dl-learner.org/carcinogenesis#d19’),
‘http://www.w3.org/1999/02/22-rdf-syntax-ns#type’, ‘http://dl-learner.org/carcinogenesis#Compound’)
-> (’http://dl-learner.org/carcinogenesis#d19’, ‘type’, ‘compound’)