Semantic Web

Pierre-Henri Paris

June 2023

Introduction

Why graphs?

Graphs are a general language for describing and analyzing entities with relations/interactions

image info Source: Stanford

image info Source: Stanford

image info Source: Stanford

Knowledge graphs

Semantic Perspective

A knowledge graph is a formal representation of knowledge, where entities are represented as nodes, and the relationships between them are represented as edges. These relationships are often imbued with semantic meaning, drawn from controlled vocabularies, taxonomies, or ontologies, allowing for rich, contextual understanding of the data.

Data Integration Perspective

A knowledge graph is a framework for integrating heterogeneous, distributed, and complex data into a unified, easy-to-understand visual structure. It creates a single ‘source of truth’ where relationships and connections between disparate data points can be analyzed and exploited.

Machine Learning Perspective

A knowledge graph is a structured and interconnected data repository that can be used to enhance machine learning algorithms. It provides contextual and relational data that improve algorithmic accuracy and prediction capability, particularly in tasks like recommendation systems or natural language understanding.

Business Intelligence Perspective

A knowledge graph is a tool for turning data into actionable insights. It structures and connects data in ways that align with business objectives, thereby supporting decision-making, predictive analytics, and operational efficiency.

User Interface Perspective

A knowledge graph is a means of presenting complex data in a visually intuitive and interactive way, making it easier for users to navigate, explore, and derive insights from large data sets.

Logic

What is reasoning?

  • Inductive reasoning
    • “If it has wheels, doors, seats, windows, engine, it must be a car.”
  • Deductive reasoning
    • “All cars are vehicles. My Lada™ is a car. Therefore, my Lada™ is a vehicle.”
  • Abductive reasoning
    • “My car is not in my garage and my wife is at home. It may have been stolen!”
  • In this course, we mainly focus on deductive reasoning!

What is an ontology?

  • Domain of discourse of a particular domain
  • Structured representation (with symbols)
  • Concepts, categories, properties, and relationships
  • Shared understanding, facilitate interoperability, and support reasoning

Logic Programming

In Datalog / Prolog style

Symbols RobertDowneyJr, NewYork, SanFrancisco
Predicates lives/2, born/2
Dataset lives(RobertDowneyJr,SanFrancisco), born(RobertDowneyJr,NewYork)

Logically consistent collection of facts

Description Logics

Based on logical formalisms, e.g., Description Logics (DL), RDFS, OWL

  • TBox (schema, ontology, theory):
    • SuccessfulAuthor ⊑ ≥1 notableWork.Bestseller
  • ABox (instances, facts, assertions):
    • SuccessfulAuthor(StanLee)
  • RBox (restrictions, constraints):
    • notableWork ⊑ created

Databases and Data Integration

Author Notable work Date of birth
Stan Lee Iron Man 12/28/1922
Bob Kane Batman 10/24/1915
  • Entities
    • Cells - classes, instances
  • Relations
    • Column headers

NLP

Building KGs from texts

Albert Einstein was a German-born theoretical physicist who developed the theory of relativity.

Einstein

Named Entity Recognition

Coco, studying at Sorbonne, designs for Chanel in Paris, France, dreaming of the Champs-Élysées. Meanwhile, Paris of Troy admires Paris, Texas’s own Eiffel Tower, and a couple explores cobblestoned Paris, Ontario, inspired by Paris Hilton.

Relation Linking

  • Name all the movies in which Robert Downey Jr acted?
  • Find me all the films casting Robert Downey Jr ?
  • List all the movies starring Robert Downey Junior?
  • RDJ has acted in which movies?

Question Answering

How many Marvel movies was Robert Downey Jr. casted in?


SELECT COUNT(?uri) WHERE {
  ?uri dbp:studio dbr:Marvel_Studios .
  ?uri dbo:starring dbr:Robert_Downey_Jr .
}

Language Modeling

Robert Downey Jr. portrayed [MASK] in the Marvel movie in 2008.

Knowledge Graph

  • Precise facts
  • Entities & relations
  • Explainability

Unstructured Sources

  • Large-scale text corpora
    • Wikipedia,
    • OpenBooks,
    • Reddit,
    • CommonCrawl,
    • etc.

Examples of knowledge graphs

  • Google Knowledge Graph
  • Amazon Product Graph
  • Facebook Graph API
  • IBM Watson
  • Microsoft Satori
  • Project Hanover/Literome
  • LinkedIn Knowledge Graph
  • Yandex Object Answer
  • IKEA Knowledge Graph

yago

dbpedia

wikidata

LOD-cloud

Applications

Serving information:

The Beatles

Applications

Question answering and conversation agents

Source: Medium

Applications

  • information extraction,
  • semantic search,
  • knowledge injection into language models

Summary

Summary

RDF, RDFS, OWL

Based on Antoine Zimmermann course

RDF

  • RDF is a data model (not a file format!)
  • RDF is a logical formalism (formal semantics)
  • RDF is a Web standard
  • RDF is the HTML of the Web of Data

RDF basics

  • Identify things (resources)
  • Express relations
  • Assign data values to things (literals)
  • Organise things in categories (i.e., classes or types)
  • Add simple knowledge about categories and relations

Identify things

  • RDF is used to describe resources
  • A resource may be anything (a real or imaginary entity, abstract or concrete)
  • To describe a resource, it must be named or identified
  • On the Web, the identification mechanism must be uniform at Web scale: an identifier must identify the same thing everywhere on the Web
  • RDF uses Internationalized Resource Identifiers or IRIs (RFC 3987)

Internationalized Resource Identifiers

  • IRIs generalise URIs (Uniform Resource Identifiers, RFC 3986) by allowing any UNICODE characters
  • IRIs and URIs identify things but may be used as locators (i.e., as URLs) at the same time
  • Examples:
    • urn:ietf:rfc:3987
    • svn://yadiyada.foo.bar/
    • mailto:antoine.zimmermann@emse.fr
    • ftp://ftp.liris.fr/#meta
    • http://en.wikipedia.org/wiki/User:Wikiuser100
  • Note: to shorten notations, we use namespace prefixes
    • rdf: is for http://www.w3.org/1999/02/22-rdf-syntax-ns#

How to choose an IRI for something?

  • If possible, reuse an existing IRI from an authoritative source, e.g.:
    • from a national library for books (library of congress, BNF, BNL, DNB)
    • from a government website for a ministry
  • If not, make your own IRI:

Relate things

  • Binary relations between things
    • “Laura loves Helmut”
    • “Steven works for Google Inc.”
  • This is written as a triple:
    (subject, predicate, object)
  • where subject and object identify the resources in the relationship, and predicate identifies the relation
  • The subject and the predicate in an RDF triple are always an IRI

Example

“Laura loves Helmut”

(http://example.org/data/Laura,         subject
  http://social.relations.com/loves,    predicate
    http://exmple.org/data/Helmut)      object

RDF triples

  • Compact syntax:
    • use namespace prefixes
    • write subject, predicate, and object side by side, separated by spaces
    • ex:Laura rel:loves ex:Helmut love

Data values

  • As everything else, a data value (number, string, date) is a resource
  • A specific data value can be identified with a literal, a character string that represents the value
  • Every literal is typed such that its string representation can be interpreted as the correct value
    • “42” represents the number fourty two if this is of type decimal integer, but represents sixty six if it is an hexadecimal integer

RDF literals

  • An RDF literal has 2 or 3 components which are:
    • A lexical form which is a UNICODE string
    • A datatype IRI that can be any IRI
    • When the datatype IRI is rdf:langString, there is a language tag which is a BCP 47 tag
  • Usually, we use standard datatype IRIs from the xsd: namespace (XML Schema Datatypes) and the rdf: namespace
  • We will write literals "lexical form"^^datatypeIRI and when it is an rdf:langString, "lexical form"@langTag

RDF literals - Examples

  • “42”^^xsd:integer
  • "THX 1138""^^xsd:string
  • "chat"@fr,"chat"@en
  • "<p>The <em>beautiful</em> literal!</p>"^^rdf:HTML

RDF graphs

  • An RDF graph is a set of RDF triples
  • RDF graphs can be drawn as directed, edge-labelled multi-graphs complex love

Unidentified resources

  • RDF can describe entities that are known to exist but whose identity is unknown (or is irrelevant/unimportant)
    • E.g., a book has at least an author, but they may not be known
  • The existence of a thing can be indicated in the subject or object position of a triple with a blank node
    • E.g. “something is in my bag”

The Turtle syntax (1)

  • Full IRIs: http://www.example.com/test#this
  • A simple triple:
    <http://www.example.com/test#this>
          <http://relations.example.com/in>
                  <http://www.example.com/test#box> .
  • Abbreviated IRIs (declare prefixes at the beginning of the file):
    # This is a comment
    @prefix ex:  . # end dot!
    PREFIX rel:  # alternative notation (no dot!)
    ex:this rel:in ex:box . # dot ends statement

The Turtle syntax (2)

# Literals:
ex:this rel:date "2019-09-13"^^xsd:date . # normal literal
ex:this rel:name "this"@en . # language-tagged literal
ex:this rel:code "TX32" . # xsd:string can be omitted
ex:this rel:number 42 . # xsd:integer (no quotes)
ex:this rel:sizeInMeters 3.75 . # xsd:decimal (use a dot)
ex:this rel:isGood true . # xsd:boolean
ex:this rel:isBorring false . # xsd:boolean

# Blanknodes:
[] rel:in ex:box .
_:b1  rel:in ex:box . # a blank node identifier...
ex:me rel:likes _:b1 . # ...allows to reuse the same blank node

The Turtle syntax (3)

# Repeat the same subject and predicate:
ex:box rel:contains ex:this .
ex:box rel:contains ex:that .
# can be written
ex:box rel:contains ex:this, ex:that . # comma


# Repeat subject:
ex:this rel:date "2019-09-13"^^xsd:date;
    rel:name "this"@en; # new lines are optional
    rel:code "TX32";
    rel:nextTo ex:that, ex:thoot, ex:thus .

The Turtle syntax (4)

# More on blank nodes:

# assume prefixes are declared
ex:johnDoe rel:worksFor [
        a ex:University; # the IRI rdf:type can be replaced by 'a'
        rel:name "Berkley";
    rel:locatedIn ex:California
] .

# is the same as:
ex:johnDoe rel:worksFor _:bnode .
_:bnode rdf:type ex:University . # 'a' and 'rdf:type' represents the same IRI
_:bnode rel:name "Berkley" .
_:bnode rel:locatedIn ex:California .

The Turtle syntax (5)

#Declaring a base IRI:
@base <http://example.com/base/> . # ends with dot
BASE <http://example.com/base/> # alternative syntax (no dot!)
# prefixes must be declared
<bob> a vocab:Person; # relative IRI
        rel:knows <claire> .
BASE <http://example.com/base2#> # base can be redefined
<bob> rel:knows <http://example.com/base/bob> . # different bobs

# is the same as:
<http://example.com/base/bob> a vocab:Person;
    rel:knows <http://example.com/base/claire> .
<http://example.com/base2#bob>
    rel:knows <http://example.com/base/bob> .

RDFS (RDF Schema)

RDFS is a semantic extension of RDF, and it provides a way to describe semantic relationships between things and provides a basic type system for RDF models.

Basic Components

  • Resources: Anything can be a resource such as a person, a car, a website, etc.
  • Classes: They are used to categorize resources.
  • Properties: They describe the relationship between resources.
  • Literals: They are basic values such as strings, numbers, etc.

Classes and Subclasses

In RDFS, we can define a class using the rdfs:Class. The rdfs:subClassOf property is used to represent inheritance between classes.

ex:Person a rdfs:Class .
ex:Student a rdfs:Class ;
    rdfs:subClassOf ex:Person .

Properties

RDFS includes the ability to describe properties (also called predicates), which are the named relations that link resources together:

  • rdf:Property: The class of all RDF properties.
  • rdfs:domain: The class of the subject in a triple.
  • rdfs:range: The class of the object in a triple.

Example:

ex:author rdf:type rdf:Property;
    rdfs:domain ex:Book;
    rdfs:range ex:Person .

Inference in RDFS

One of the key advantages of RDFS is the ability to make inferences, or to derive additional information from the existing knowledge base.

Example:

ex:HarryPotter ex:author ex:JKRowling.
ex:author rdfs:domain ex:Book.
ex:author rdfs:range ex:Person.

From this information, we can infer:

ex:HarryPotter rdf:type ex:Book.
ex:JKRowling rdf:type ex:Person.

Labels and Comments

In RDFS, it is possible to add human-readable labels and comments to resources. This makes the RDF document easier to understand for individuals reviewing the data. It can be particularly helpful for understanding the semantics of an RDF document without needing to look up the definitions of resources and properties in the schema.

  • rdfs:label: Provides a human-readable version of a resource’s name.
  • rdfs:comment: Gives a brief description of a resource.

Example:

ex:Person a rdfs:Class;
    rdfs:label "Person";
    rdfs:comment "Represents a person" .

ex:Student a rdfs:Class;
    rdfs:subClassOf ex:Person;
    rdfs:label "Student";
    rdfs:comment "Represents a student, which is a type of person" .

In the above example, rdfs:label and rdfs:comment are used to provide a human-readable name and description for the ex:Person and ex:Student classes.

RDFS Limitations

  • It doesn’t allow the description of properties of properties (i.e., it cannot say that a property is transitive, symmetric, etc.).
  • It doesn’t allow the definition of constraints (i.e., it cannot limit the number of instances of a class, cannot enforce a property to have a single value, etc.).
  • It doesn’t support logical operators to combine classes (i.e., it cannot create a new class as a union, intersection, or complement of other classes).

OWL (Web Ontology Language)

OWL is a more expressive language than RDFS and is used to create ontologies. An ontology is a specification of a conceptualization, or a way of representing knowledge.

Overview of OWL

OWL provides more complex classes and relationships than RDFS, including:

  • Symmetry, transitivity, and inverses for properties: for example, if A is a brother of B, then B is a brother of A.
  • Enumerated classes: that is, classes that have a specific, predefined list of members.
  • Boolean combinations of classes: intersections (AND), unions (OR), and complements (NOT) of classes.
  • Cardinality restrictions: for example, stating that each instance of a certain class must be related to exactly two instances of another class.

OWL 2 Profiles

OWL 2, the most recent version of the Web Ontology Language includes three profiles designed to meet different use case requirements and computational needs.

OWL 2 EL

This profile is designed for applications that require very large ontologies. The expressivity of the language is restricted to ensure that all reasoning tasks can be performed in polynomial time. This profile is especially relevant in fields like bioinformatics where ontologies can contain millions of classes.

OWL 2 QL

This profile is optimized for query answering over large datasets. It is mainly intended for applications that use data repositories managed through relational database systems. OWL 2 QL is a tractable language with a lower computational complexity, ensuring queries can be answered efficiently even when dealing with voluminous data.

OWL 2 RL

This profile is aimed at rule-based reasoning. The expressivity of the language is reduced to enable implementation of reasoners using rule-based technologies. It is designed for scalable reasoning while maintaining an acceptable level of expressivity.

OWL Classes and Properties

OWL builds upon RDFS by adding additional class and property types.

ex:Parent a owl:Class ;
    rdfs:subClassOf [
      a owl:Restriction ;
      owl:onProperty ex:hasChild ;
      owl:someValuesFrom ex:Person 
    ] .

Equivalent Classes and Properties

OWL allows for specifying that two classes or properties are equivalent.

  • owl:equivalentClass: The classes have the same instances.
  • owl:equivalentProperty: The properties relate the same pairs of instances.
ex:Mother a owl:Class ;
    owl:equivalentClass [
      a owl:Class ;
      rdfs:subClassOf ex:Parent ;
      rdfs:subClassOf [
        a owl:Restriction ;
        owl:onProperty ex:gender ;
        owl:hasValue ex:Female
      ]
    ] .

Disjoint Classes

In OWL, the owl:disjointWith property allows to specify that two classes have no instances in common.

ex:Male a owl:Class .
ex:Female a owl:Class .
ex:Male owl:disjointWith ex:Female .

In the above example, the classes ex:Male and ex:Female are specified as being disjoint, meaning an individual cannot be an instance of both these classes.

OWL Individual

OWL individuals represent the instances of the class. OWL individuals can have properties associated with them.

ex:John a ex:Person ;
    ex:name "John Doe" ;
    ex:hasChild ex:David .

Identity and Interlinking

owl:sameAs is used to declare that two URI references actually refer to the same thing. If you state that A owl:sameAs B, you’re stating that any property that A has, B also has, and vice versa.

ex:JohnDoe owl:sameAs ex:JohnathanDoe .

In this example, ex:JohnDoe and ex:JohnathanDoe are considered to be the same individual.

Property Characteristics

OWL allows for the specification of certain characteristics of properties.

owl:FunctionalProperty

This specifies that a property is functional, meaning that for a given subject, there can only be one unique value of this property. For example, a person has exactly one biological mother.

owl:InverseFunctionalProperty

This specifies that a property is inverse-functional, meaning that for a given value, there can only be one unique subject of this property. For example, a biological mother can have many children, but each child has exactly one biological mother.

owl:TransitiveProperty

This specifies that a property is transitive, meaning that if A is related to B, and B is related to C, then A is related to C. An example would be the property “ancestorOf”.

Negative Assertions

OWL 2 introduces the possibility of stating negative information. This is done through the owl:NegativePropertyAssertion construct:

[ a owl:NegativePropertyAssertion ;
  owl:sourceIndividual ex:John ;
  owl:assertionProperty ex:hasSibling ;
  owl:targetIndividual ex:Mary ]

In the above example, the statement asserts that John does not have Mary as a sibling.

RDFS and OWL Conclusion

In conclusion, RDFS and OWL are critical components in creating and maintaining the Semantic Web. RDFS provides a basic way to create a vocabulary for describing resources and their relationships. OWL takes it a step further, allowing for a more expressive and detailed way to describe resources and the relationships between them.

SPARQL

SPARQL basics

  • The syntax looks similar to SQL
  • The features are similar to SQL
  • A family of standards:
    • SELECT queries
    • Update (INSERT / DELETE) queries
    • Protocols
    • Reasoning at query time
  • Standards for managing RDF data in general
  • SQL and SQL DBMS are to the relational data model what SPARQL and its standards are to the RDF data model

SPARQL SELECT

  • Variable: an element of a set disjoint from IRIs, literals and blank nodes
  • Basic graph pattern: an RDF graph where subject,predicate or object can be replaced by a variable
  • An answer to a SELECT query is a mapping from variables in the query to IRIs union literals union blank nodes in the queried graph

TODO: put example of graph pattern with their respective query and image

SPARQL example

#Ex. 1
#Associate URIs with prefixes
PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

#Example of a SELECT query, retrieving 2 variables
#Variables selected MUST be bound in graph pattern
SELECT ?subject ?label
WHERE {
    #This is our graph pattern
    ?subject rdfs:label ?label;
        rdf:type space:Discipline .
}

SPARQL example

#Ex. 2
PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

#Example of a SELECT query, retrieving all variables
SELECT *
WHERE {
    ?subject rdfs:label ?label;
        rdf:type space:Discipline .
}

OPTIONAL bindings

How do we allow for missing or unknown information?

#Ex. 3
PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?name ?country
WHERE {
    #This pattern must be bound
    ?thing rdfs:label ?name .
    #Anything in this block doesn't have to be bound
    OPTIONAL {
        ?thing space:country ?country .
    }
}

UNION queries

How do we allow for alternatives or variations in the graph?

#Ex. 4
PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?subject ?displayLabel
WHERE {
    {
        ?subject foaf:name ?displayLabel .
    }
    UNION
    {
        ?subject rdfs:label ?displayLabel .
    }
}

Sorting & Restrictions

How do we apply a sort order to the results and restrict the number of results returned?

#Ex. 5
#Select the uri and the mass of the 11-20th most heaviest spacecraft
PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?spacecraft ?mass
WHERE {
    ?spacecraft space:mass ?mass .
}
#Use an ORDER BY clause to apply a sort. Can be ASC or DESC
ORDER BY DESC(?mass)
#Limit to ten results
LIMIT 10
#Apply an offset to get next "page"
OFFSET 10

Filtering

How do we restrict results based on aspects of the data rather than the graph, e.g., string matching?

#Sample data for Sputnik launch
<http://purl.org/net/schemas/space/launch/1957-001> rdf:type space:Launch;
#Assign a datatype to the literal, to indicate it is a date
    space:launched "1957-10-04"^^xsd:date;
    space:spacecraft
        <http://purl.org/net/schemas/space/spacecraft/1957-001B>.

Filtering

How do we restrict results based on aspects of the data rather than the graph, e.g., string matching?

#Ex. 6
#Select name of spacecraft launched between 1st Jan 1969 and 1st Jan 1970
PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?name
WHERE {
    ?launch space:launched ?date;
        space:spacecraft ?spacecraft .
    ?spacecraft foaf:name ?name .
    FILTER (?date > "1969-01-01"^^xsd:date &&
        ?date < "1970-01-01"^^xsd:date)
}

Filtering

How do we restrict results based on aspects of the data rather than the graph, e.g., string matching?

#Ex. 7
#Select spacecraft with a mass of less than 90kg
PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?spacecraft ?name
WHERE {
    ?spacecraft foaf:name ?name;
        space:mass ?mass .
    #Note that we have to cast the data to the right type
    #As it is not declared in the data
    FILTER( xsd:double(?mass) < 90.0 )
}

Filtering

How do we restrict results based on aspects of the data rather than the graph, e.g., string matching?

#Ex. 8
#Select spacecraft with a name like “ollo”
PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?name
WHERE {
    ?spacecraft foaf:name ?name .
    FILTER( regex(?name, "ollo", "i" ) )
}

Built-In Filters

  • Logical: !, &&, ||
  • Math: +, -, *, /
  • Comparison: =, !=, >, <, …
  • SPARQL tests: isURI, isBlank, isLiteral, bound
  • SPARQL accessors: str, lang, datatype
  • Other: sameTerm, langMatches, regex

DISTINCT

How do we remove duplicate results?

#Ex. 9
#Select spacecraft with a mass of less than 90kg
PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?agency
WHERE {
    ?spacecraft space:agency ?agency .
}

Extended Query Language Power (SPARQL 1.1)

  • Aggregates
  • Sub-queries
  • Negation and filtering
  • Property paths
  • Introducing new variables
  • Basic federated query
  • Graph Patterns inside FILTERs

Aggregates

  • AVG(expr)
  • COUNT(*) and COUNT(expr)
  • GROUP_CONCAT(expr)
  • MAX(expr)
  • MIN(expr)
  • SAMPLE(expr)
  • SUM(expr)

Aggregates (cont.)

  • All are allowed with and without DISTINCT across the arguments.
  • Grouping of results is optionally done with GROUP BY otherwise the entire result set is 1 group (like SQL). This may bind a variable too.
  • HAVING executes a filter expression over the results of an aggregation (like SQL)

Sub-queries

SPARQL 1.1 allows sub-SELECTs

#Ex. 10
PREFIX : <http://people.example/>

SELECT ?y ?minName
WHERE {
    :alice :knows ?y .
    {
        SELECT ?y (MIN(?name) AS ?minName)
        WHERE {
            ?y :name ?name .
        }
        GROUP BY ?y
    }
}

Negation and Filtering

  • 3 new ways to negate / exclusion:
    • OPTIONAL { graph-pattern } (1.0)
    • FILTER … !expr (1.0)
    • FILTERNOT EXISTS { graph-pattern } (1.1)
  • Aggregation using HAVING with either of the above (1.1)
  • graph-pattern MINUS graph-pattern (1.1)
  • (Some of these can be done with complex UNION and OPTIONAL patterns)

Property path

  • This changes the fundamental SPARQL matching
    • From: Triple pattern matches a triple to bind variables.
    • To: Triples with property paths regex-like match multiple triples to bind variables.
  • Depending on the data, the query engine could do a simple match or do a lot of searching for matches.
  • New syntax to select different properties from a subject node:
    • a/b ^a a|b a* a+ a? a{m,n} a{n} a{m,} a{,n} where a and b are property IRIs.

Basic Federated Queries

  • A graph pattern that invokes a SPARQL protocol call and remote query returning the usual result formats
  • Allows querying multiple SPARQL databases in one query
    #Ex. 11
    SELECT ?person
    WHERE {
      ?person knows ?x
      SERVICE  <http://social-db.com/sparql/> {
          ?x foaf:name ?name;
              ex:birthdate ?b .
      }
    }

More

  • More functions and operators
  • Introducing new variables
  • RDF graph database management:
    • INSERT triples / graphs
    • DELETED triples / graphs
  • ASK, DESCRIBE, CONSTRUCT

Storage

In Files

  • Turtle: a compact, human-friendly format.
  • N-Triples: a very simple, easy-to-parse, line-based format that is not as compact as Turtle.
  • TriG: an extension of Turtle to datasets.
  • N-Quads: a superset of N-Triples, for serializing multiple RDF graphs.
  • RDF/XML: the first standard format for serializing RDF.
  • RDF/JSON: an alternative syntax for expressing RDF triples using a simple JSON notation.

Embedded Annotations

Embedded annotations refer to the process of integrating structured data into web pages. This integration is crucial for web crawlers or other machines to understand the content of the web page and its context better.

Schema.org

Schema.org is a collaborative effort, founded by Google, Microsoft, Yahoo, and Yandex, aiming to create, maintain, and promote schemas for structured data on the Internet. It provides a collection of shared vocabularies webmasters can use to mark up their pages in ways that can be understood by the major search engines.

JSON-LD

It is a World Wide Web Consortium (W3C) standard to encode Linked Data using JSON

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://example.com/article"
  },
  "headline": "Example Article",
  "image": "https://example.com/photos/1x1/photo.jpg",
  "author": {
    "@type": "Person",
    "name": "John Doe"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Example.com",
    "logo": {
      "@type": "ImageObject",
      "url": "https://example.com/logo.jpg"
    }
  },
  "datePublished": "2022-05-10T08:00:00+08:00",
  "dateModified": "2022-05-20T09:20:00+08:00"
}
</script>

Microdata

Microdata is an HTML specification used to nest structured data within HTML content.

<div itemscope itemtype="http://schema.org/Person">
  <span itemprop="name">John Doe</span>
  <span itemprop="email">john@example.com</span>
</div>

RDFa

RDFa (Resource Description Framework in Attributes) is an HTML5 extension that supports linked data.

<div vocab="http://schema.org/" typeof="Person">
  <span property="name">John Doe</span>
  <span property="email">john@example.com</span>
</div>

Use Cases

In addition to SEO and search engine discovery:

  • Rich Search Results: Inclusion to search results of features like images, reviews, and more.
  • Social Media Cards: Use of structured data to create a preview of the link with a title, description, and image.
  • Voice Search and Virtual Assistants: Use of structured data to understand and respond to voice queries.
  • Email Marketing: To provide users with actions they can take directly from their inbox.

Native RDF Stores

  • Also known as triplestores
  • Specifically designed to store, retrieve, and manage RDF data
  • Optimized for SPARQL queries
  • Highly efficient
  • Virtuoso
  • Jena TDB
  • Stardog

RDF-Enabled Relational Databases

  • Traditional relational databases that have been extended to store RDF data
  • Different techniques like specific relational schemas
  • D2RQ
  • Virtuoso RDF Views

NoSQL Databases for RDF Storage

  • Graph databases
  • Scalability
  • Schema-less
  • AllegroGraph
  • Neo4j

RDF in the Cloud

  • Cost-effectiveness
  • Scalability
  • Distribution
  • Amazon Neptune
  • Google Cloud Datastore
  • RDF4J Server

Programming languages

Python

rdflib

from rdflib import Graph, Literal, BNode, Namespace, RDF, URIRef

n = Namespace("http://example.org/people/")
g = Graph()

john = BNode()
g.add((john, RDF.type, n.Person))
g.add((john, n.name, Literal('John')))

RDF example

Python

from rdflib import Graph

g = Graph()
g.parse("http://example.org/")

qres = g.query(
    """
    SELECT ?subject ?predicate ?object
    WHERE {
        ?subject ?predicate ?object.
    }
    """)

for row in qres:
    print("%s knows %s" % row)

SPARQL example

Java

Apache Jena

import org.apache.jena.rdf.model.*;

Model model = ModelFactory.createDefaultModel();
Resource johnSmith = model.createResource(
          "http://example.org/people/JohnSmith");
johnSmith.addProperty(VCARD.FN, "John Smith");

RDF example

Java

import org.apache.jena.query.*;

String sparqlQueryString = 
    "SELECT ?subject ?predicate ?object\n" +
    "WHERE {\n" +
    "    ?subject ?predicate ?object .\n" +
    "}\n";

Query query = QueryFactory.create(sparqlQueryString);
QueryExecution qexec = QueryExecutionFactory.create(query, dataset);
ResultSet results = qexec.execSelect();
ResultSetFormatter.out(System.out, results, query);

SPARQL example

C#

dotNetRDF

using VDS.RDF;

var g = new Graph();
var dotNetRDF = g.CreateUriNode(
  UriFactory.Create("http://example.org/people/JohnSmith"));
g.Assert(new Triple(
  dotNetRDF, 
  g.CreateUriNode(
    UriFactory.Create("http://www.w3.org/1999/02/22-rdf-syntax-ns#type")), 
  g.CreateUriNode(UriFactory.Create("http://example.org/Person"))));

RDF example

C#

using VDS.RDF.Query;

SparqlQueryParser parser = new SparqlQueryParser();
SparqlQuery query = parser.ParseFromString(
  @"SELECT ?subject ?predicate ?object 
  WHERE { ?subject ?predicate ?object . }");

SparqlResultSet resultSet = endpoint.QueryWithResultSet(query);
foreach (SparqlResult result in resultSet)
{
    Console.WriteLine(result.ToString());
}

SPARQL example

JavaScript

rdflib.js

var $rdf = require('rdflib');

var store  = $rdf.graph();
var person = $rdf.sym('http://example.org/people/JohnSmith');
var name = $rdf.sym('http://schema.org/name');

store.add(person, name, 'John Smith', person.doc());

RDF example

JavaScript

var $rdf = require('rdflib');

var store  = $rdf.graph();
store.parse(`your RDF data here`, "text/turtle", 'http://example.org/');

var query = $rdf.SPARQLToQuery(
  `SELECT ?subject ?predicate ?object 
  WHERE { ?subject ?predicate ?object . }`, false, store);

store.query(query, function(result) {
    console.log(
      result['?subject'].value, 
      result['?predicate'].value, result['?object'].value);
});

SPARQL example

RDF-star and SHACL

RDF-star and SPARQL-star

RDF-star Basics

  • Extension of RDF
  • Express more complex RDF graphs
  • Triples about triples
  • Soon a W3C standard

RDF and RDF-star: A Quick Comparison

ex:Alice ex:knows ex:Bob .

RDF Graph

<< ex:Alice ex:knows ex:Bob >> ex:assertedBy ex:Carol .

RDF-star Graph

Key Concepts of RDF-star

  • IRIs (Internationalized Resource Identifiers)
  • Literals
  • Blank Nodes
  • Triples: In RDF-star, triples consist of a subject, a predicate, and an object, where the subject and the object can be either an IRI, a literal, a blank node, or another triple.

Application of RDF-star

  • To represent metadata about statements
    • The source of a statement
    • The time the statement was made
    • The level of confidence in the statement
  • More precise and nuanced knowledge representation

SPARQL-star Basics

  • Extension of SPARQL
  • Supports RDF-star data
  • Soon a W3C standard

SPARQL and SPARQL-star: A Quick Comparison

SELECT ?object
WHERE {
    ex:Alice ex:knows ?object .
}

SPARQL query

SELECT ?assertedBy
WHERE {
    << ex:Alice ex:knows ex:Bob >> 
        ex:assertedBy  ?assertedBy .
}

SPARQL-star query

Key Concepts of SPARQL-star

  • Query Forms: SELECT, CONSTRUCT, DESCRIBE, and ASK
  • Variables: placeholders used to capture and return parts of the data, including nested triples
  • Triple Patterns

SHACL

  • Shapes Constraint Language
  • W3C standard
  • Validation of RDF
    • Users can describe and enforce constraints on RDF graphs
    • Ensure data quality and consistency

Why is SHACL Useful?

  • Data Quality Assurance: ensure data integrity
  • Schema Documentation: self-documentation (implicitly define the schema of the RDF graph)
  • Form Generation: generation of forms in a UI
  • Data Integration: validation of the results of data integration processes

Basic Concepts

SHACL describes the shape of an RDF graph through a set of constraints. Each constraint is associated with a SHACL shape, and each shape is associated with one or more target nodes.

SHACL Shapes

A SHACL shape is a collection of conditions that the data must satisfy.

  • the type of data that a property can have
  • the number of values a property can have
  • the format of a string (regex)

Target Nodes

Target nodes are the RDF nodes that a shape applies to.

  • node type
  • property value
  • explicit declaration

SHACL example

@prefix ex: <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

# The following will pass
ex:John a ex:Person ;
  ex:age 30 ;
  ex:email "john@example.com"^^xsd:string .

# The following will fail
ex:Bob a ex:Person ;
  ex:age 17 .

# The following will fail
ex:Alice a ex:Person ;
  ex:age 21 ;
  ex:email "sdffsd"^^xsd:string .
@prefix ex: <http://example.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:PersonShape a sh:NodeShape ;
  sh:targetClass ex:Person ;
  sh:property [
    sh:path ex:age ;
    sh:datatype xsd:integer ;
    sh:minInclusive 18 ;
    sh:maxInclusive 99 ;
    sh:severity sh:Violation ;
  ] ;
  sh:property [
    sh:path ex:email ;
    sh:pattern ".+@.+\\..+" ;
    sh:severity sh:Violation ;
  ] .

Advanced SHACL Concepts

Advanced SHACL concepts include logical constraints (such as AND, OR, and NOT), complex property paths, and shape hierarchies.

Comparison: SHACL vs OWL

  • Complexity and Performance
    • OWL has more expressive power
    • SHACL offers better performance
  • Validation vs Inference
    • OWL is an inference language (derive new facts)
    • SHACL checks whether data conforms to a specific shape or not (no new facts)
  • User Friendliness: SHACL is easier to learn
  • Tool Support: broader tool support for OWL

R2RML and RML

R2RML

R2RML Basics

  • RDB to RDF Mapping Language
  • W3C standard
  • Language for expressing customized mappings from relational databases to RDF graphs

Key Components of R2RML

  • Triples Maps: rules to generate RDF triples
    • from the rows of a database table
    • or a SQL query’s result
  • Logical Table: base table (or view) in the database, or a custom SQL query that provides the data
  • Subject Map: a template that generates the RDF subject for each row
  • Predicate-Object Maps (POM): define how to generate the RDF predicate and object for each row
  • RefObjectMap: generate RDF triples when you have foreign key relationships

Example of R2RML

Let’s assume we have a simple database table “Student” with columns “id”, “name”, and “email”.

<#TriplesMap1>
  rr:logicalTable [ rr:tableName "Student" ];
  rr:subjectMap [ rr:template "http://example.com/student/{id}" ];
  rr:predicateObjectMap [
    rr:predicate ex:name;
    rr:objectMap [ rr:column "name" ]
  ];
  rr:predicateObjectMap [
    rr:predicate ex:email;
    rr:objectMap [ rr:column "email" ]
  ].

RML

RML Basics

  • RDF Mapping Language
  • Extends R2RML
  • Mappings from various structured data formats (such as JSON, CSV, XML) to RDF datasets
  • The key components of RML are the same than those of R2RML
    • Except for Logical Source (replace R2RML’s logical table)

Example of RML

Suppose we have a simple CSV file “students.csv” with columns “id”, “name”, and “email”.

<#TriplesMap1>
  rml:logicalSource [
    rml:source "students.csv";
    rml:referenceFormulation ql:CSV
  ];
  rr:subjectMap [
    rr:template "http://example.com/student/{id}"
  ];
  rr:predicateObjectMap [
    rr:predicate ex:name;
    rr:objectMap [ rml:reference "name" ]
  ];
  rr:predicateObjectMap [
    rr:predicate ex:email;
    rr:objectMap [ rml:reference "email" ]
  ].

KG Embeddings

What are KG embeddings?

  • a technique used to represent the entities and relations in a knowledge graph as vectors in a continuous vector space
  • translate the high-dimensional, sparse, and often symbolic information in a knowledge graph a low-dimensional, dense, and continuous space where semantic relationships are preserved

Common features

  1. Vector Space Representation
  2. Learning from Triples
  3. Distance or Similarity Measure
  4. Predictive Modeling
  5. Optimization Problem
  6. Unsupervised Learning
  7. Scalability

Why KG embeddings?

  1. Link Prediction
  2. Entity Resolution
  3. Entity Classification
  4. Recommendation Systems
  5. Question Answering Systems
  6. Drug Discovery

Knowledge Graph Embedding Techniques

TransE

  • How it works: Represents relationships as translations in the embedding space.
  • Pros: Simple, efficient, effectively captures semantic relationships between entities.
  • Cons: Struggles with modeling 1-to-N, N-to-1, and N-to-N relationships, assumes relations are transitive.
  • Paper: Translating Embeddings for Modeling Multi-relational Data

TransR

DistMult

HolE

  • How it works: Uses circular correlation of entity embeddings to model relationships.
  • Pros: Effective at modeling complex and asymmetric relationships, reduces number of parameters.
  • Cons: Could be computationally intensive due to the correlation operation.
  • Paper: Holographic Embeddings of Knowledge Graphs

ComplEx

  • How it works: Uses complex-valued embeddings to better handle asymmetric relationships.
  • Cons: The complex embeddings can be more challenging to interpret, requires more computational resources due to the need to handle complex numbers.
  • Paper: Complex Embeddings for Simple Link Predictions

RDF2Vec

  • How it works: Generates sequences of entities (walks) from the graph, and then applies the Word2Vec model on these walks to create the embeddings.
  • Pros: Captures both local and global semantic information, flexible, can work with different types of graphs.
  • Cons: Quality of embeddings depends on the walks generated, does not explicitly model relations.
  • Paper: RDF2Vec: RDF Graph Embeddings for Data Mining

RESCAL

RotatE

QuatE

  • How it works: Uses quaternion algebra to model entities and relations.
  • Pros: Captures more complex interactions and dependencies.
  • Cons: More computationally intensive and complex.
  • Paper: Quaternion Knowledge Graph Embedding

KG embeddings VS OWL & SPARQL

KG embeddings: Pros

  1. Scalability
  2. Predictive Power
  3. Robustness to Noise
  4. Inductive and abductive reasoning

KG embeddings: Cons

  1. Lack of Explicit Semantics
  2. Difficulty in Incorporating Prior Knowledge
  3. No deductive reasoning

OWL and SPARQL: Pros

  1. Explicit Semantics
  2. Incorporation of Prior Knowledge
  3. Standardization
  4. Deductive reasoning

OWL and SPARQL: Cons

  1. Scalability
  2. Lack of Predictive Power
  3. Sensitivity to Noise
  4. No inductive and abductive reasoning

Tools

TL;DR

RDF, RDFS, OWL

RDF (Resource Description Framework), RDFS (RDF Schema), and OWL (Web Ontology Language) are the foundational technologies that enable us to define and structure our data in a way that is both human-readable and machine-interpretable, creating rich, interconnected webs of data.

SPARQL

SPARQL is a powerful query language for RDF. SPARQL allows us to interrogate our data, ask complex questions, and extract valuable insights.

KG Storage

A great number of storage solutions for our Knowledge Graphs. Understanding different approaches to KG storage is crucial for ensuring the performance, scalability, and long-term maintainability of our datasets.

RDF-star and SHACL

RDF-star is an extension of RDF that allows for more complex statements about other statements.

SHACL (Shapes Constraint Language), a language for validating RDF graphs against a set of conditions. These technologies provide us with even more expressivity and reliability in our data handling.

R2RML and RML

In the R2RML and RML section, we learnt how to map our existing relational databases to RDF using R2RML (RDB to RDF Mapping Language), and how to transform various data formats (CSV, JSON, XML, etc.) into RDF using RML (RDF Mapping Language).

KG Embeddings

KG Embeddings allow us to represent nodes and relationships from our Knowledge Graph in a numerical, dense vector space. This is a powerful technique for applying machine learning methods to our KG, opening up possibilities for tasks like link prediction, entity resolution, and recommendation systems.

Outro

Not seen

  • Ontology Design and Engineering
  • Data Integration and Interlinking
  • KG Visualization
  • Privacy and Ethics in KGs

Additional resources

This course is largely based on the following resources:

Knowledge Graphs by Hogan et al. (2021)

CS 520 Knowledge Graphs and CS 224 Machine Learning with Graphs (Stanford courses)

Knowledge Representation and Reasoning and Semantic Web by Antoine Zimmermann

Embedding Knowledge Graphs with RDF2vec by Heiko Paulheim , Petar Ristoski , Jan Portisch

Contact

If you have any questions or comments, please do not hesitate to contact me:

pierre[dash]henri[dot]paris[at]telecom[dash]paris[dot].fr