This document describes TriQL.P, a query language for extracting information from Named Graphs published by untrustworthy sources.
This document describes TriQL.P, a query language for extracting information from Named Graphs published by untrustworthy sources. One of the main goals of the Semantic Web initiative is to build an infrastructure, which allows applications to use data provided by large networks of independent data sources. Aggregating data from different sources automatically raises questions about data quality and data trustworthiness. Before the gathered information can be used, its quality and trustworthiness has to be evaluated according to application-specific trust requirements. This trust decision can be based on:
The main objective of TriQL.P is supporting different, subjective and task-specific trust-policies based on information from all four catgories.
More information about the ideas behind TriQL.P is found in [BiOl04]. More information about publishing information on the Semantic Web using Named Graphs and digital signatures is found in [CaBiHaSt04].
TriQL.P extends TriQL, which is based on RDQL. The basic idea of TriQL is using graph patterns for querying sets of Named Graphs. A graph pattern consists of an optional graph name and a set of triple patterns. In addition, TriQL.P allows
In contrast to the approach taken in [CaBiHaSt04], TriQL.P assumes to be used within an architecture having the following four layers:
|
![]() |
This section describes how different trust policies can be expressed using TriQL.P. The examples use the "Semantic Web Publishing" vocabulary (swp:) defined in [CaBiHaSt04]. Using this vocabulary, Named Graphs can be published together with provenance information about themselfs.
Example (using the TriG syntax):
# Example Document
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix swp: <http://www.w3.org/2004/03/trix/swp-1/> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix ex: <http:// www.example.org#> . @prefix : <http://www.example.org/exampleDocument#> .
:G1 { ex:Monica ex:name "Monica Murphy" .
ex:Monica rdf:type ex:Person .
ex:Monica ex:homepage <http://www.monicamurphy.org> .
ex:Monica ex:email <mailto:monica@monicamurphy.org> .
ex:Monica ex:skill ex:Programming .
ex:Monica ex:skill ex:Management }
:G2 { ex:Franz rdf:type ex:Person .
ex:Franz ex:skill ex:Programming .
ex:Franz ex:affiliation ex:ProjectInterVal .
ex:Franz ex:affiliation ex:ProjectKnowledgeNet }
:G3 { :G1 swp:assertedBy _:w1 .
_:w1 swp:authority ex:Chris .
_:w1 dc:date "2003-10-02"^^xsd:date .
:G2 swp:quotedBy _:w2 .
:G3 swp:assertedBy _:w2 .
_:w2 dc:date "2003-09-03"^^xsd:date .
_:w2 swp:authority ex:Chris .
ex:Chris rdf:type ex:Person .
ex:Chris ex:email <mailto:chris@bizer.de> }
Context-Based Trust Mechanisms use metainformation about the circumstances in which information has been claimed, e.g. who said what, when and why [Marchiori04]. They include role-based trust mechanisms, using background information about the author's role or his membership in a specific group, for trust decisions.
Example Query 1: Get Monica's skills. Use only information that has been asserted by Chris.
SELECT ?skill
WHERE ?graph ( ex:Monica ex:skill ?skill )
(?graph swp:assertedBy ?warrant)
(?warrant swp:authority ex:Chris) USING ex FOR <http://www.example.org/vocabulary#>
swp FOR <http://www.w3.org/2004/03/trix/swp-1/>
Query Results 1:
| No. | ?skill |
| 1. | Resource: http://www.example.org/vocabulary#Management |
The query uses three graph patterns. The variable ?graph is bound to the names of all graphs that contain information about Monika's skills. The second and third pattern restrict ?graph to graphs which have been asserted by Chris. Note, that it doesn't matter if the assertedBy and the authority information in contained in the same graph or in different graphs.
Example Query 2: Get Monica's skills. Use only information that has been asserted by Chris after "2003-01-01".
SELECT ?skill
WHERE ?graph ( doc:Monica ex:skill ?skill )
(?graph swp:assertedBy ?warrant .
?warrant swp:authority doc:Chris .
?warrant dc:date ?date ) AND ?date > "2003-01-01"^^xsd:date USING ex FOR <http://www.example.org/vocabulary#> xsd FOR <http://www.w3.org/2001/XMLSchema#>
swp FOR <http://www.w3.org/2004/03/trix/swp-1/>
doc FOR <http://www.example.org/exampleDocument#>
dc FOR <http://purl.org/dc/elements/1.1/>
This query uses a AND condition to restrict ?date to values greater than "1/1/2003". Note, that the assertedBy, the authority and the date information has to be contained in the same graph this time.
Example Query 3: Get Monica's skills. Use only information that has been asserted by the set of people I trust.
SELECT ?skill
WHERE ?graph1 ( ex:Monica ex:skill ?skill )
( ?graph1 swp:assertedBy ?warrant .
?warrant swp:authority ?authority ) ?graph2 ( ex:Chris ex:trusts ?authority . ?graph2 swp:assertedBy ?graph2 . ?graph2 swp:authority ex:Chris ) USING ex FOR <http://www.example.org/vocabulary#>
swp FOR <http://www.w3.org/2004/03/trix/swp-1/>
Signing Named Graphs is discussed in [CaBiHaSt04]. The following query assumes an architecture, where the crawler verifies signatures in the information gathering process and adds the information that a signature was verifiable together with a graph to the local knowledge base.
Example Query 4: Get Monica's skills. Use only information that has been signed by the information providers hows signatures could be verified in the information gathering process.
SELECT ?skill
WHERE ?graph1 ( ex:Monica ex:skill ?skill )
( ?graph1 swp:assertedBy ?warrant ) ?graph2 ( ?warrant swp:signatureVerifiedBy ex:Crawler . ?graph2 swp:assertedBy ?graph2 . ?graph2 swp:authority ex:Crawler ) USING ex FOR <http://www.example.org/vocabulary#>
swp FOR <http://www.w3.org/2004/03/trix/swp-1/>
Example Query 5: The example TriQL.P query below retrieves all persons with the skill "Programming", based only on claims by people who have an affiliation to at least 2 projects involving programming.
SELECT ?person
WHERE ?graph (?person km:skill km:Programming .
?person rdf:type km:Person )
(?graph swp:assertedBy ?warrant .
?warrant swp:authority ?author )
(?author km:affiliation ?project )
(?project rdf:type km:Project .
?project km:topic km:Programming )
AND COUNT(?project) > 1
USING km FOR <http://www.example.org/vocabulary#> rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
swp FOR <http://www.w3.org/2004/03/trix/swp-1/>
A more sophisticated trust policy would require background information to be signed and claimed by a group of trusted information providers.
Example Query 6: Removed in version 2004-09-06.
Reputation-Based Trust Mechanisms include rating systems like the one used by eBay and Web-Of-Trust mechanisms. All trust architectures proposed for the Semantic Web so far fall into this category. The general problem with these approaches is that they require explicit and topic-specific trust ratings and that providing such ratings and keeping them up-to-date puts an unrealistically heavy burden on information consumers.
Example Query 7: Get all vendors of product XY which got more positive than negative ratings.
SELECT ?vendor
WHERE ?graph1 ( ?vendor ex:offers ex:ProductXY ) AND METRIC(ex:MorePositiveRatings, ?vendor) USING ex FOR <http://www.example.org/vocabulary#>
The METRIC clause tests if there are more positive than negative ratings for a vendor using the ex:MorePositiveRatings metric. The metric is implemented as plug-in into the TriQL.P engine.
Background information about Web-of-Trusts is found in [GoHePa03][RiAgDo03].
Example Query 8: Get Monica's skills. Use only information that has been asserted by people I know though my Web-of-Trust.
SELECT ?skillThe METRIC clause tests if ?authority is on ?env_user's web-of trust. The ex:Part-of-Web-of-Trust metric is implemented as plug-in into the TriQL.P engine . The environment variable ?env_user sets the user for whom the web of trust is calculated. The last parameter is the URI of the trust policy to be used to gather the WOT ratings.
WHERE ?graph1 ( ex:Monica ex:skill ?skill )
( ?graph1 swp:assertedBy ?warrant .
?warrant swp:authority ?authority ) AND METRIC(ex:Part-of-Web-of-Trust, ?authority, ?env_user, ex:trustPolicyforWOT) USING ex FOR <http://www.example.org/vocabulary#>
swp FOR <http://www.w3.org/2004/03/trix/swp-1/>
Example Query 9: Get Monica's skills. Only use information from authors who have a trust rating of at least 0.8 determined though my Web-of-Trust.
SELECT ?skill
WHERE ?graph1 ( ex:Monica ex:skill ?skill )
( ?graph1 swp:assertedBy ?warrant .
?warrant swp:authority ?authority ) AND METRIC(ex:Web-of-Trust-Ranking, ?authority, ?env_user, "0.8", ex:trustPolicyforWOT) USING ex FOR <http://www.example.org/vocabulary#>
swp FOR <http://www.w3.org/2004/03/trix/swp-1/>
Content-based trust policies do not use metadata about information, but rules and axioms together with the information content itself and related information about the same topic published by other authors.
Example Query 10: Get all news reports about topic X. Believe only reports which have been stated by at least 5 different authors.
SELECT ?newsreport
WHERE ?graph1 ( ?newsreport rdf:type ex:NewsReport . ?newsreport ex:topicX ) ( ?graph1 swp:assertedBy ?warrant .
?warrant swp:authority ?authority )
AND COUNT(?authority) > 4 USING ex FOR <http://www.example.org/vocabulary#>
rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
swp FOR <http://www.w3.org/2004/03/trix/swp-1/>
Example Query 11: Get the names of all elephants. Believe that something is an elephant only when it claims to be an elephant, has four legs and weights over 1000 kg.
SELECT ?name
WHERE ( ?elephant ex:name ?name . ?elephant rdf:type ex:elephant . ?elephant ex:weight ?weight . ?elephant ex:legs ?legs . AND ?weight > 1000 && ?legs == 4 USING ex FOR <http://www.example.org/vocabulary#> rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
swp FOR <http://www.w3.org/2004/03/trix/swp-1/>
Users querying information from various independent sources need to decide which information in the query result they are trusting. The key factor in this trust decision is the understanding of information provenance and information ratings. Justification trees attached to TriQL.P query results try to facilitate this understanding. Justification trees contain information why results are matching the query goals. Applications can use the justification tree to explain why retrieved information fulfils the quality and trust requirements formulated within the query. Using information from the justification tree, it is possible to implement Tim Berners-Lee's "Oh, yeah?"-button [BernersLee97], meaning that a user can click on every piece of information within an application and get explanations why she should trust the information.
TriQL.P returns variable bindings together with a justification tree for each set of bindings. A justification tree contains the matching bindings for each pattern in the pattern tree. The justification tree returned by example query 5 would contain information about the authors and their projects. A justification tree attached to a binding returned by a query, which uses a reputation-based trust mechanism, would include all known ratings for the selected object.
Justification Tree Example 1:
Let's use query 5, which retrieves all persons with the skill "Programming", based only on claims by people who have an affiliation to at least 2 projects involving programming, as an example.
SELECT ?person
WHERE ?graph (?person km:skill km:Programming .
?person rdf:type km:Person )
(?graph swp:assertedBy ?warrant .
?warrant swp:authority ?author )
(?author km:affiliation ?project )
(?project rdf:type km:Project .
?project km:topic km:Programming )
AND COUNT(?project) > 1
USING km FOR <http://www.example.org/vocabulary#> rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
swp FOR <http://www.w3.org/2004/03/trix/swp-1/>
The query could return two variable bindings:
| No. | ?person |
| 1. | Resource: http://www.example.org#Monica |
| 2. | Resource: http://www.example.org#Patrick |
In this case the following justification tree would be attached to the binding "?person = http://www.example.org#Monica" in oder to explain, why Monica has been selected:
?person = http://www.example.org#Monica
| Index | Claimed in Graph | Justification Bindings | Matching Pattern |
| 1. | ex:Graph1 | ?graph = ex:Graph1 ?person = ex:Monica | ?graph (?person km:skill km:Programming
. ?person rdf:type km:Person) |
| 2. | ex:Graph2 | ?graph = ex:Graph1 | (?graph swp:assertedBy ?warrant . ?warrant swp:authority ?author ) |
| 3.1 | ex:Graph3 | ?author = ex:Chris ?project = ex:projectInterval | (?author km:affiliation ?project ) |
| 3.2 | ex:Graph3 | ?author = ex:Chris ?project = ex:projectKnowledgeNet | (?author km:affiliation ?project ) |
| 4.1 | ex:Graph4 | ?project = ex:projectInterval | (?project rdf:type km:Project> . ?project km:topic km:Programming ) |
| 4.2 | ex:Graph5 | ?project = ex:projectKnowledgeNet | (?project rdf:type km:Project . ?project km:topic km:Programming ) |
Justification Tree Example 2:
The following query retrieves all vendors of product XY and ranks the results by the number of positive vendor ratings.
SELECT ?vendor
WHERE ?graph1 ( ?vendor ex:offers ex:ProductXY )
?graph2 ( ?vendor ex:rating ex:positive ) ORDER BY COUNT(?graph2) USING ex FOR <http://www.example.org/vocabulary#>
The query could return the following variable bindings and ranking scores:
| No. | ?person | Ranking Score |
| 1. | Resource: http://www.example.org#Shop22 | 4 |
| 2. | Resource: http://www.example.org#Shop100 | 2 |
Shop22 appears on the top of the list, because it recieved four positive ratings. The justification tree attached to "?vendor = http://www.example.org#Shop22" would list these ratings:
?vendor = http://www.example.org#Shop22
| Index | Claimed in Graph | Justification Bindings | Matching Pattern |
| 1. | ex:Graph1 | ?graph = ex:Graph1 ?vendor = ex:Shop22 | ?graph1 ( ?vendor ex:offers ex:ProductXY ) |
| 2.1 | ex:Graph2 | ?graph = ex:Graph2 | ?graph2 ( ?vendor ex:rating ex:positive ) |
| 2.2 | ex:Graph3 | ?graph = ex:Graph3 | ?graph2 ( ?vendor ex:rating ex:positive ) |
| 2.3 | ex:Graph4 | ?graph = ex:Graph4 | ?graph2 ( ?vendor ex:rating ex:positive ) |
| 2.4 | ex:Graph5 | ?graph = ex:Graph5 | ?graph2 ( ?vendor ex:rating ex:positive ) |
Justification trees are of course not understandable for a common user of an application. Application developers have to think about develope ways to display parts of the tree as explanation why data used by the application. TBL's "Oh, yeah?"-button [BernersLee97] could be a good approach for this. When pressing the button an explanation window could pop up:
Similar approaches to justification trees are also researched within the data base and data warehousing community. There the topic is called "Lineage Tracing". An interesting example are the lineage features of the Stanford WHIPS data warehousing prototype. They are described together with some links to related work in [CuWi99].
Explanations offered by the justification tree differ from the deductive proof traces know from logic and classical AI. A proof trace explains the rules and ground facts which where used to infer query results. An interesting distributed proof infrastructure based on Semantic Web technologies is described in [McGuinnessDaSilva03]. Compared to proof traces the justification tree approach is more data oriented while proofs are logic oriented.
The METRIC clause provided an open interface for different trust metrics and algorithms. The result of the evaluation of a METRIC clause is True/False. If a reputation metric returns a scalar value, then threshold has to the specified in the TriQL.P query (see example query 9).
Interface
Implementation
METRICs are implemented as plug-ins into the TriQLP query engine. The can independently access the knowledgebase to get ratings and other information.
This section defines the TriQL.P grammar. It is extending Andy Seaborne's RDQL grammar defined in [RDQL].
| QuotedURI | ::= | '<' URI characters (from RFC 2396) '>' | |
| NSPrefix | ::= | NCName As defined in XML Namespace v1.1 and XML 1.1 | |
| LocalPart | ::= | NCName As defined in XML Namespace v1.1 and XML 1.1 | |
| SELECT | ::= | 'SELECT' | Case Insensitive match |
| FROM | ::= | 'FROM' | Case Insensitive match |
| SOURCE | ::= | 'SOURCE' | Case Insensitive match |
| WHERE | ::= | 'WHERE' | Case Insensitive match |
| AND | ::= | 'AND' | Case Insensitive match |
| ORDERBY | ::= | 'ORDER BY' | Case Insensitive match |
| USING | ::= | 'USING' | Case Insensitive match |
| Identifier | ::= | ([a-z][A-Z][0-9][-_.])+ | |
| EOF | ::= | End of file | |
| COMMA | ::= | ',' | |
| INTEGER_LITERAL | ::= | ([0-9])+ | |
| FLOATING_POINT_LITERAL | ::= | ([0-9])*'.'([0-9])+('e'('+'|'-')?([0-9])+)? | |
| STRING_LITERAL1 | ::= | '"'UTF-8 characters'"' (with escaped \") | |
| STRING_LITERAL2 | ::= | "'"UTF-8 characters"'" (with escaped \') | |
| STRING_LITERAL3 | ::= | "'"UTF-8 characters"'" | |
| LPAREN | ::= | '(' | |
| RPAREN | ::= | ')' | |
| COMMA | ::= | ',' | |
| DOT | ::= | '.' | |
| GT | ::= | '>' | |
| LT | ::= | '<' | |
| BANG | ::= | '!' | |
| TILDE | ::= | '~' | |
| HOOK | ::= | '?' | |
| COLON | ::= | ':' | |
| EQ | ::= | '==' | |
| NEQ | ::= | '!=' | |
| LE | ::= | '<=' | |
| GE | ::= | '>=' | |
| SC_OR | ::= | '||' | |
| SC_AND | ::= | '&&' | |
| STR_EQ | ::= | 'EQ' | Case Insensitive match |
| STR_NE | ::= | 'NE' | Case Insensitive match |
| PLUS | ::= | '+' | |
| MINUS | ::= | '-' | |
| STAR | ::= | '*' | |
| SLASH | ::= | '/' | |
| REM | ::= | '%' | |
| STR_MATCH | ::= | '=~' | '~~' | |
| STR_NMATCH | ::= | '!~' | |
| DATATYPE | ::= | '^^' | |
| AT | ::= | '@' | |
| COUNT | ::= | 'COUNT' | Case Insensitive match |
| IN | ::= | 'IN' | Case Insensitive match |
| WITH | ::= | 'WITH' | Case Insensitive match |
| BOOLEAN_METRIC | ::= | 'BOOLEAN_METRIC' | Case Insensitive match |
| DEGREE_METRIC | ::= | 'DEGREE_METRIC' | Case Insensitive match |
References to lexical tokens are enclosed in <>. Whitespace is skipped.
Notes: The term "literal" refers to a constant value, and not only an RDF Literal.