pyrdf2vec.embedders package¶
Submodules¶
Module contents¶
isort:skip_file
- class pyrdf2vec.embedders.Embedder¶
Bases:
object
Base class of the embedding techniques.
- abstract fit(corpus, is_update=False)¶
Fits a model based on the provided corpus.
- abstract transform(entities)¶
Constructs a features vector of the provided entities.
- Parameters
entities (
List
[str
]) – The entities including test entities to create the embeddings. Since RDF2Vec is unsupervised, there is no label leakage.- Return type
- Returns
The features vector of the provided entities.
- Raises
NotImplementedError – If this method is called, without having provided an implementation.
- class pyrdf2vec.embedders.FastText(**kwargs)¶
Bases:
pyrdf2vec.embedders.embedder.Embedder
Defines the FastText embedding technique.
SEE: https://radimrehurek.com/gensim/models/fasttext.html
The RDF2Vec implementation of FastText does not consider the min_n and max_n parameters for n_gram splitting.
This implementation for RDF2Vec computes ngrams for walks only by splitting (by their symbol “#”) the URIs of subjects and predicates. Indeed, objects being encoded in MD5, splitting in ngrams does not make sense.
It is likely that you want to provide another split strategy for the calculation of the n-grams of the entities. If this is the case, provide your own compute_ngrams_bytes function to FastText.
- _model¶
The gensim.models.word2vec model. Defaults to None.
- kwargs¶
The keyword arguments dictionary. Defaults to { bucket=2000000, min_count=0, max_n=0, min_n=0,
negative=20, vector_size=500 }
- func_computing_ngrams¶
The function to call for the computation of ngrams. In case of reimplementation, it is important to respect the signature imposed by gensim: func(entity: str, minn: int = 0, maxn: int = 0) -> List[bytes] Defaults to compute_ngrams_bytes
- fit(walks, is_update=False)¶
Fits the FastText model based on provided walks.
- class pyrdf2vec.embedders.Word2Vec(**kwargs)¶
Bases:
pyrdf2vec.embedders.embedder.Embedder
Defines the Word2Vec embedding technique.
SEE: https://radimrehurek.com/gensim/models/word2vec.html
- _model¶
The gensim.models.word2vec model. Defaults to None.
- kwargs¶
The keyword arguments dictionary. Defaults to { min_count=0 }.
- fit(walks, is_update=False)¶
Fits the Word2Vec model based on provided walks.