pyrdf2vec.walkers.halk module

class pyrdf2vec.walkers.halk.HALKWalker(max_depth, max_walks=None, sampler=NOTHING, n_jobs=None, *, with_reverse=False, random_state=None, md5_bytes=8, freq_thresholds=NOTHING)

Bases: pyrdf2vec.walkers.random.RandomWalker

HALK walking strategy which removes rare vertices from randomly extracted walks, increasing the quality of the generated embeddings while memory usage decreases.

_is_support_remote

True if the walking strategy can be used with a remote Knowledge Graph, False Otherwise Defaults to True.

freq_thresholds

The minimum frequency thresholds of a (predicate, object) hop to be kept. Beware that the accumulation of several freq_thresholds extracts more walks, which is not always desirable. Defaults to [0.01].

kg

The global KG used later on for the worker process. Defaults to None.

max_depth

The maximum depth of one walk.

max_walks

The maximum number of walks per entity. Defaults to None.

md5_bytes

The number of bytes to keep after hashing objects in MD5. Hasher allows to reduce the memory occupied by a long text. If md5_bytes is None, no hash is applied. Defaults to 8.

random_state

The random state to use to keep random determinism with the walking strategy. Defaults to None.

sampler

The sampling strategy. Defaults to UniformSampler.

with_reverse

True to extracts parents and children hops from an entity, creating (max_walks * max_walks) walks of 2 * depth, allowing also to centralize this entity in the walks. False otherwise. Defaults to False.

build_dictionary(walks)

Builds a dictionary of predicates mapped with the walk(s) identifiers to which it appears.

Parameters

walks (List[Tuple[str, ...]]) – The walks to build the dictionary.

Return type

DefaultDict[str, Set[int]]

Returns

The dictionary of predicate names.

get_rare_predicates(vertex_to_windices, walks, freq_threshold)

Gets vertices which doesn’t reach a certain threshold of frequency of occurrence.

Parameters
  • vertex_to_windices (DefaultDict[str, Set[int]]) – The dictionary of predicates mapped with the walk(s) identifiers to which it appears.

  • walks (List[Tuple[str, ...]]) – The walks.

  • freq_threshold (float) – The threshold frequency of occurrence.

Return type

Set[str]

Returns

the infrequent vertices.