pyrdf2vec.embedders package¶

Submodules¶

Module contents¶

isort:skip_file

class pyrdf2vec.embedders.Embedder¶

Bases: object

Base class of the embedding techniques.

abstract fit(corpus, is_update=False)¶

Fits a model based on the provided corpus.

Parameters: corpus (List[List[Tuple[str, ...]]]) – The corpus to fit the model.
Return type: Embedder
Returns: The fitted model according to an embedding technique.
Raises: NotImplementedError – If this method is called, without having provided an implementation.

abstract transform(entities)¶

Constructs a features vector of the provided entities.

Parameters: entities (List[str]) – The entities including test entities to create the embeddings. Since RDF2Vec is unsupervised, there is no label leakage.
Return type: List[str]
Returns: The features vector of the provided entities.
Raises: NotImplementedError – If this method is called, without having provided an implementation.

class pyrdf2vec.embedders.FastText(**kwargs)¶

Bases: pyrdf2vec.embedders.embedder.Embedder

Defines the FastText embedding technique.

SEE: https://radimrehurek.com/gensim/models/fasttext.html

The RDF2Vec implementation of FastText does not consider the min_n and max_n parameters for n_gram splitting.

This implementation for RDF2Vec computes ngrams for walks only by splitting (by their symbol “#”) the URIs of subjects and predicates. Indeed, objects being encoded in MD5, splitting in ngrams does not make sense.

It is likely that you want to provide another split strategy for the calculation of the n-grams of the entities. If this is the case, provide your own compute_ngrams_bytes function to FastText.

_model¶: The gensim.models.word2vec model. Defaults to None.

kwargs¶: The keyword arguments dictionary. Defaults to { bucket=2000000, min_count=0, max_n=0, min_n=0,

negative=20, vector_size=500 }

func_computing_ngrams¶: The function to call for the computation of ngrams. In case of reimplementation, it is important to respect the signature imposed by gensim: func(entity: str, minn: int = 0, maxn: int = 0) -> List[bytes] Defaults to compute_ngrams_bytes

fit(walks, is_update=False)¶

Fits the FastText model based on provided walks.

Parameters

walks (List[List[Tuple[str, ...]]]) – The walks to create the corpus to to fit the model.
is_update (bool) – True if the new corpus should be added to old model’s walks, False otherwise. Defaults to False.

Return type

Embedder

Returns

The fitted FastText model.

transform(entities)¶

The features vector of the provided entities.

Args:
entities: The entities including test entities to create the embeddings. Since RDF2Vec is unsupervised, there is no label leakage.

Return type: List[str]
Returns: The features vector of the provided entities.

class pyrdf2vec.embedders.Word2Vec(**kwargs)¶

Bases: pyrdf2vec.embedders.embedder.Embedder

Defines the Word2Vec embedding technique.

SEE: https://radimrehurek.com/gensim/models/word2vec.html

_model¶: The gensim.models.word2vec model. Defaults to None.

kwargs¶: The keyword arguments dictionary. Defaults to { min_count=0 }.