Glossary

It’s not always easy to understand all the notions used in pyRDF2Vec. This glossary is here to help you to have an idea behind all these notions:

anonymous walks

Transformation walking strategy that transforms label information into positional information.

Continuous Bag-of-Words (CBOW)

Model, part of Word2vec, that predicts target words from contextual words in a given window.

depth

Refers to the number of hops in a walk.

community hops

A hop to a node that is not a neighbor, but is rather part of the same community, which is determined through community detection.

community walks

An extraction walk strategy that allows for community hops with a certain probability.

embedding technique

Technique used in machine learning to represent complex objects (e.g., texts, images, graphs) into a vector with a reduced number of features compared to the dimension of the dataset, while keeping the most important information about them.

embeddings (or latent representation/vectors)

Numerical representation of a node in a given Knowledge Graph, where entities that are semantically related should be close to each other in the embedded space.

entity

Specific type of node in a Knowledge Graph that is characterized by a URI.

feature matrix

An NxK matrix where N is the number of entities and K the embedding size, which can be used for further downstream Machine learning (ML) tasks.

Hierarchical Random Walks (HALK)

Transformation walk strategy that removes rare entities from random walks.

Knowledge Graph (KG)

A graphical representation of (domain or expert) knowledge encoded as a collection of triples having the form (subject, predicate, object).

N-Gram walks

The transformation walk strategy based on that creates N-grams from N consecutive hops in a walk, which are then relabeled.

RDF2Vec

Unsupervised technique that can create task-agnostic numerical representations of the nodes in a Knowledge Graph by extending successful language modeling techniques.

sampling strategy

A strategy to select the next neighbor in a walk. This can either be at random or guided by some metric (biased walks).

Skip-Gram (SG)

Model, part of Word2vec, that predicts the context words from the target words in a given window.

SPARQL Query Language (SPARQL)

Declarative Query Language (e.g., SQL) for performing Data Manipulation and Data Definition operations on Data represented as a collection of RDF Language sentences/statements.

SPARQL endpoint

Point of presence identified by a URL (SPARQL Endpoint URL) and located on an HTTP network that is capable of receiving and processing requests under the SPARQL protocol.

Uniform Resource Identifier (URI)

Unique character string that identifies a particular resource, using a predefined set of syntax rules.

vertex

Node in graph which can be one of the three following types: entity, blank or literal.

walk

Sequence of vertices that can be found in the Knowledge Graph by traversing the given directed links.

walking strategy

Generates graph walks for each vertex of a given knowledge graph, from a certain depth according to a type of a strategy (type 1 for extraction or type 2 for transformation).

walklets

Transformation walking strategy with walks of length two, consisting of the root of the original walk and one of the hops.

Word2vec

Neural language modeling techniques (NLP), which takes sequences of words to embed words into vector spaces.