pyrdf2vec.walkers.halk module¶
- class pyrdf2vec.walkers.halk.HALKWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8, freq_thresholds=NOTHING)¶
Bases:
pyrdf2vec.walkers.random.RandomWalker
HALK walking strategy which removes rare vertices from randomly extracted walks, increasing the quality of the generated embeddings while memory usage decreases.
- _is_support_remote¶
True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.
- freq_thresholds¶
The minimum frequency thresholds of a (predicate, object) hop to be kept. Beware that the accumulation of several freq_thresholds extracts more walks, which is not always desirable. Defaults to [0.01].
- kg¶
The global KG used later on for the worker process. Defaults to None.
- max_depth¶
The maximum depth of one walk.
- max_walks¶
The maximum number of walks per entity. Defaults to None.
- md5_bytes¶
The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.
- random_state¶
The random state to use to keep random determinism with the walking strategy. Defaults to None.
- sampler¶
The sampling strategy. Defaults to UniformSampler.
- with_reverse¶
True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.
- build_dictionary(walks)¶
Builds a dictionary of predicates mapped with the walk(s) identifiers to which it appears.
- get_rare_predicates(vertex_to_windices, walks, freq_threshold)¶
Gets vertices which doesn’t reach a certain threshold of frequency of occurrence.
- Parameters
- Return type
- Returns
the infrequent vertices.