Glossary¶

It’s not always easy to understand all the notions used in pyRDF2Vec. This glossary is here to help you to have an idea behind all these notions:

anonymous walks: Transformation walking strategy that transforms label information into positional information.
Continuous Bag-of-Words (CBOW): Model, part of Word2vec, that predicts target words from contextual words in a given window.
depth: Refers to the number of hops in a walk.
community hops: A hop to a node that is not a neighbor, but is rather part of the same community, which is determined through community detection.
community walks: An extraction walk strategy that allows for community hops with a certain probability.
embedding technique: Technique used in machine learning to represent complex objects (e.g., texts, images, graphs) into a vector with a reduced number of features compared to the dimension of the dataset, while keeping the most important information about them.
embeddings (or latent representation/vectors): Numerical representation of a node in a given Knowledge Graph, where entities that are semantically related should be close to each other in the embedded space.
entity: Specific type of node in a Knowledge Graph that is characterized by a URI.
feature matrix: An NxK matrix where N is the number of entities and K the embedding size, which can be used for further downstream Machine learning (ML) tasks.
Hierarchical Random Walks (HALK): Transformation walk strategy that removes rare entities from random walks.
Knowledge Graph (KG): A graphical representation of (domain or expert) knowledge encoded as a collection of triples having the form (subject, predicate, object).
N-Gram walks: The transformation walk strategy based on that creates N-grams from N consecutive hops in a walk, which are then relabeled.
RDF2Vec: Unsupervised technique that can create task-agnostic numerical representations of the nodes in a Knowledge Graph by extending successful language modeling techniques.
sampling strategy: A strategy to select the next neighbor in a walk. This can either be at random or guided by some metric (biased walks).
Skip-Gram (SG): Model, part of Word2vec, that predicts the context words from the target words in a given window.
SPARQL Query Language (SPARQL): Declarative Query Language (e.g., SQL) for performing Data Manipulation and Data Definition operations on Data represented as a collection of RDF Language sentences/statements.
SPARQL endpoint: Point of presence identified by a URL (SPARQL Endpoint URL) and located on an HTTP network that is capable of receiving and processing requests under the SPARQL protocol.
Uniform Resource Identifier (URI): Unique character string that identifies a particular resource, using a predefined set of syntax rules.
vertex: Node in graph which can be one of the three following types: entity, blank or literal.
walk: Sequence of vertices that can be found in the Knowledge Graph by traversing the given directed links.
walking strategy: Generates graph walks for each vertex of a given knowledge graph, from a certain depth according to a type of a strategy (type 1 for extraction or type 2 for transformation).
walklets: Transformation walking strategy with walks of length two, consisting of the root of the original walk and one of the hops.
Word2vec: Neural language modeling techniques (NLP), which takes sequences of words to embed words into vector spaces.