Return-Path: X-Original-To: apmail-incubator-jena-commits-archive@minotaur.apache.org Delivered-To: apmail-incubator-jena-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 872D89616 for ; Fri, 13 Jan 2012 13:12:27 +0000 (UTC) Received: (qmail 56082 invoked by uid 500); 13 Jan 2012 13:12:27 -0000 Delivered-To: apmail-incubator-jena-commits-archive@incubator.apache.org Received: (qmail 56058 invoked by uid 500); 13 Jan 2012 13:12:26 -0000 Mailing-List: contact jena-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jena-dev@incubator.apache.org Delivered-To: mailing list jena-commits@incubator.apache.org Received: (qmail 56043 invoked by uid 99); 13 Jan 2012 13:12:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Jan 2012 13:12:25 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Jan 2012 13:12:18 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 1318F238897D for ; Fri, 13 Jan 2012 13:11:56 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r802536 - in /websites/staging/jena/trunk/content/jena/documentation: notes/sse.html tdb/optimizer.html tdb/tdb_transactions.html Date: Fri, 13 Jan 2012 13:11:55 -0000 To: jena-commits@incubator.apache.org From: buildbot@apache.org X-Mailer: svnmailer-1.0.8-patched Message-Id: <20120113131156.1318F238897D@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: buildbot Date: Fri Jan 13 13:11:55 2012 New Revision: 802536 Log: Staging update by buildbot for jena Added: websites/staging/jena/trunk/content/jena/documentation/notes/sse.html Modified: websites/staging/jena/trunk/content/jena/documentation/tdb/optimizer.html websites/staging/jena/trunk/content/jena/documentation/tdb/tdb_transactions.html Added: websites/staging/jena/trunk/content/jena/documentation/notes/sse.html ============================================================================== --- websites/staging/jena/trunk/content/jena/documentation/notes/sse.html (added) +++ websites/staging/jena/trunk/content/jena/documentation/notes/sse.html Fri Jan 13 13:11:55 2012 @@ -0,0 +1,604 @@ + + + + + + + Apache Jena - SPARQL S-Expressions (or "SPARQL Syntax Expressions") + + + + + + + + + + +
+

SPARQL S-Expressions (or "SPARQL Syntax Expressions")

+

A way to write down data structures in an RDF-centric syntax.

+

But not an idea for another RDF serialization format.

+

Contents

+ +

Need

+

The SPARQL algebra +defines the semantics of a SPARQL graph pattern. Every SPARQL query +string (the syntax) is mapped to a SPARQL algebra expression.

+

It is convenient to be able to print out such algebra expressions for +discussion between people and for debugging. Further, if algebra +expressions can be read back in as well, testing of specific parts of an +implementation is also easier.

+

This is an example of a general problem : how to express data structures +where the basic elements of RDF are based on RDF nodes.

+

RDF itself is often the most appropriate way to do this, but sometimes +it isn't so convenient. An algebra expression is a tree, and order +matters.

+

When expressing a data structure, there are certain key structure that +need to be expressible: arrays and maps, then sets and bags, but +expression of a data structure is not the same as the high-level +semantics of the data structure.

+

A stack can be expressed as a list. And because we want to express the +structure, and not express the operations on the structures, data +structures that operational meaning don't enter the picture. There are +no operations, no push, pop or peek.

+

Note that this is to express a data structure, not encode or +repressent it. By express we mean communicate it, between people or +between cooperating machines. The structures are not completely +self-representing. But we do discuss a way to express in RDF that does +give a self-describing nature through the use of tagged structures.

+

Design Intent

+
    +
  • Concise (=> for people to write conveniently) format for data + structures
  • +
  • RDF-centric
  • +
+

Non-goals:

+
    +
  • to directly represent any data structure.
  • +
  • to be another RDF syntax.
  • +
+

So desirable features are:

+
    +
  • Concise syntax for RDF terms
  • +
  • Datastructures
  • +
+

Other Approaches

+

RDF

+

RDF is "map-centric" but not all data structures are conveniently +expressible in maps. RDF has lists, and these lists have convenient +syntax in Turtle or N3.

+

If your data structure fits the RDF paradigm, then RDF is a better +choice that SSE. Below is a possible mapping from SSE +to RDF as Turtle.

+

Lisp

+

Lacks convenient syntax for the RDF terms themselves.

+

SSE syntax is almost valid +Scheme; literal +language tags and datatypes get split a separate list symbols but the +information is recoverable. Scheme doesn't use [] lists or +single-quoted strings.

+

XML

+

Too verbose.

+

JSON

+

JSON provides values (strings, +numbers, booleans, null), arrays and object (which are maps). SPARQL +Query Results in +JSON +shows how JSON might be used. It describes how RDF terms are encoded +into further substructures. Alternatively, we could put encoded terms in +strings like "<http://w3.org/>" and have a parser-within-a-parser. But +both these approaches do not make the writing of RDF terms as easy as it +could be.

+

Design

+

S-expressions +using RDF terms.

+

The command arq.qparse --print=op --file queryFile will print the +SPARQL algebra for the query in SSE format.

+

Tokens

+

Tokens are the atomic elements of the syntax.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ExampleExplaination
"abc"string
"abc"@enstring with language tag.
123number, specifically an xsd;integer.
<http://example.org/>IRI (or URI).
_:abcblank node.
?xvariable
?variable
ex:thingprefixed name
ex:123prefixed name
SELECTsymbol
+symbol
@xyzsymbol
+

For ? (no name), a unique, internal name for a fresh variable will be +allocated; every use of ? is a different variable.

+

??x creates a non-distinguished variable. ?? creates a fresh +non-distinguished variable.

+

_: creates a fresh blank node.

+

@xyz - this is a symbol because a language tags only follow a lexical +form.

+

Almost any sequence of characters which is not an RDF term or variable +is a symbol that can be given special meaning by processing software.

+

SSE Comments

+

# or ; introduce comments, which run to the end of line, including +the end-of-line characters.

+

SSE Escapes

+

\u and \U escape sequences for arbitrary Unicode codepoints. These +apply to the input character stream before parsing. They don't, for +example, permit a space in a symbol.

+

Strings provide \n, \t, \r, \b, \b, \f, \", \' and \\ +escape sequences as in SPARQL.

+

Structures

+

(?x ns:p "abc") - list of 3 elements: a variable, a prefixed name and +a string

+
(bgp 
+  [?x ns:p "abc"])
+
+ + +

A list of 2 elements: a symbol (bgp) and a list of 3 elements. Both +() and [] delimit lists; they must match but otherwise it's a free +choice. Convention is that compact lists use []; large lists use ().

+

Tagged Structures

+

The basic syntax defines tokens and lists. Higher level processing +happens on this basic syntax and can be extended by interpreting the +structure.

+

Layers on top of the basic abstract syntax produce specialised data +structures. This can be a transformation into a new SSE structure or the +production of programming language objects.

+

This is driven by tagged (data) objects in an SSE expression. The tag is +a symbol and the elements of the data object are the rest of the list.

+
(+ 1 2)
+
+ + +

is tagged with symbol +

+
(triple ?s ?p "text"@en)
+
+ + +

is tagged with symbol triple

+

IRI resolution

+

One such layer is IRI and prefix name resolution, using tags base and +prefix.

+

Basic syntax includes unresolved IRIs, (example <abc>) and prefixed +names (example foaf:name). These are turned into absolute IRIs and the +base and prefix tagged object wrappers are removed.

+

This is sufficiently important that the SSE library handles this in an +optimized fashion where the IRI processing directly rewrites the +streamed output of the parser.

+

base

+
(base <http://example/>
+   (triple <xyz> ?p "lex"^^<thing>))
+
+ + +

becomes

+
(triple <http://example/xyz> ?p "lex"^^<http://example/thing>)
+
+ + +

prefix

+
(prefix ((: <http://example/>)
+          (ns: <http://example/ns#>))
+     (triple :x ns:p "lex"^^ns:type))
+
+ + +

becomes

+
 (triple <http://example/x> <http://example/ns#p> "lex"^^<http://example/ns#type>)
+
+ + +

Nesting

+

The tagged structures can be combined and nested. The base or prefixes +declared only apply to the body of the data object.

+
 (prefix ((: <http://jena.hpl.hp.com/2007/>)
+             (foaf:  <http://xmlns.com/foaf/0.1/>))
+       (triple (base <http://jena.hpl.hp.com/> <afs> foaf:name "Andy")))
+
+ + +

Combined with the triple builder, this will produce a triple:

+
 <http://jena.hpl.hp.com/afs> <http://xmlns.com/foaf/0.1/name> "Andy" .
+
+ + + +

Not implemented

+

Not all data structures can be conveniently expressed as nested lists. +Sub-element sharing matters. A structure with shared elements can't be +serialized as a strict tree and some form of reference is needed.

+

Name a place in the structure: (name@ symbol X)

+

Link to it: (@link symbol)

+

The link layer will produce an SSE structure without these tags, having +replaced all name@ and @link with the shared structure X.

+

@ is a convention for referencing.

+

Building Java Objects

+

Builders are code classes that process the structure into Java objects. +Writing builders is straight-forward because low-level parsing details +have been taken care of in the basic syntax. A typical builder is a +recursive-decent parser over the abstract syntax tree, coding one is +primarily walking the structure, with a tagged object to Java instance +mapping being applied.

+

Some tagged objects with builders are:

+
    +
  • (triple S P O) where X is an RDF node (RDF term or variable).
  • +
  • (quad G S P O)
  • +
  • (graph triple*)
  • +
  • (graph@ URL) — Read a URL.
  • +
+

@@ Need to write the abstract syntax for each tagged object

+

Many builders have convenience syntax. Triples can be abbreviated by +omitting the tag triple because usually the fact it is a triple is +clear.

+
(bgp (triple ?s ?p ?o)) 
+(bgp (?s ?p ?o))
+
+ + +

Quads have a similar abbreviation as 4-lists. In addition, _ is a quad +on the default graph.

+

Elements for executing SPARQL:

+
    +
  • SPARQL algebra operators and basic graph patterns
  • +
  • Filter expressions (in prefix notation (+ 1 2))
  • +
  • Query solutions (Bindings) and tables.
  • +
+

SSE Factory

+

The class SSE in package com.hp.hpl.jena.sparql.sse provides many +convenience functions to call builders for RDF and SPARQL structures.

+
Node n = SSE.parseNode("<http://example/node>") ;
+Triple t = SSE.parseTriple("(?s ?p ?o)") ;
+Op op = SSE.parseOp("(filter (> ?v 123) (bgp (?s ?p ?v)))") ;
+
+ + +

Most of the operations have forms that allow a PrefixMapping to be +specified - this is wrapped around the parser run so prefixed names can +be used without explicit prefix declarations.

+

There is a default prefix mapping with a few common prefixes: rdf, +rdfs, owl, xsd and fn (the XPath/XQuery functions and operators +namespace).

+

Mapping to RDF

+

The syntax of SSE is very close to Turtle lists because the syntax for +IRIs and literals are the same.: to produce Turtle (outline):

+
    +
  1. Replace symbols by IRIs: prepend a common URI and %-encode any + characters necesary.
  2. +
  3. Replace variables by IRIs: prepend a common URI.
  4. +
  5. Move prefixes to be @prefix directives.
  6. +
  7. Put a dot at the end of the file.
  8. +
+

The result is an RDF model using only the properties rdf:first and +rdf:rest so it records the data structure, but not what hthe data +structure represents.

+

SSE Files

+

The file extension is .sse and all files are UTF-8.

+

A quick and pragmatic Emacs mode is given by:

+
;; ==== SSE mode
+(define-derived-mode sse-mode lisp-mode "SSE" nil
+   (make-local-variable 'lisp-indent-function)
+   (setq lisp-indent-function 'sse-indent-function)
+   )
+
+;; Everything in SSE is "def" like
+(defun sse-indent-function (indent-point state)
+  (lisp-indent-defform state indent-point))
+
+(setq auto-mode-alist
+       (cons '("\\.sse" . sse-mode) auto-mode-alist))
+
+ + +

Longer Examples

+

Query 1

+
PREFIX foaf:       <http://xmlns.com/foaf/0.1/>
+SELECT DISTINCT ?name ?nick
+{
+    ?x foaf:mbox <mailt:person@server> .
+    ?x foaf:name ?name 
+    OPTIONAL { ?x foaf:nick ?nick }
+}
+
+(prefix ((foaf: <http://xmlns.com/foaf/0.1/>))
+   (distinct
+     (project (?name ?nick)
+       (leftjoin
+         (BGP
+           [triple ?x foaf:mbox <mailto:person@server>]
+           [triple ?x foaf:name ?name]
+         )
+         (BGP [triple ?x foaf:nick ?nick])
+       ))))
+
+ + +

Complete SPARQL Execution

+

The following is a complete query execution, data and query. There is an +inline dataset and a query of

+
 PREFIX : <http://example/>
+ SELECT * 
+ {
+   GRAPH :g1 { ?x ?p ?v }
+ }
+
+ + +

The tag graph is used twice, with different meanings. First, for an +RDF graph, and second in GRAPH SPARQL pattern. In a data structrure, +context sorts out the different usages.

+
(prefix ((: <http://example/>))
+   (exec
+     (dataset
+       (default (graph
+          (:x :p 1)
+          (:x :p 2)))
+       (namedgraph :g1
+     (graph
+       (:x :gp 1)
+       (:x :gp 2)))
+       (namedgraph :g2
+     (graph
+       (:y :gp 1)
+       (:y :gp 2)))
+       )
+
+     (graph :g1
+       (bgp (?x ?p ?v)))
+     ))
+
+ + +

SSE Grammar

+

@@ insert grammar here

+
+ + + + + Modified: websites/staging/jena/trunk/content/jena/documentation/tdb/optimizer.html ============================================================================== --- websites/staging/jena/trunk/content/jena/documentation/tdb/optimizer.html (original) +++ websites/staging/jena/trunk/content/jena/documentation/tdb/optimizer.html Fri Jan 13 13:11:55 2012 @@ -207,10 +207,28 @@ be appeciated. Mail to

TDB chooses the basic graph pattern optimizer by the presence of a file in the database directory.

Optimizer control files

-

File name | Effect -none.opt | No reordering - execute triple patterns in the order in the query -fixed.opt | Use a built-in reordering based on the number of variables in a triple pattern. -stats.opt | The contents of this file are the weighing rules (see below).

+ + + + + + + + + + + + + + + + + + + + + +
File nameEffect
none.optNo reordering - execute triple patterns in the order in the query
fixed.optUse a built-in reordering based on the number of variables in a triple pattern.
stats.optThe contents of this file are the weighing rules (see below).

The contents of the files none.opt and fixed.opt are not read and don't matter. They can be zero-length files.

If more then one file is found, the choice is made: stats.opt Modified: websites/staging/jena/trunk/content/jena/documentation/tdb/tdb_transactions.html ============================================================================== --- websites/staging/jena/trunk/content/jena/documentation/tdb/tdb_transactions.html (original) +++ websites/staging/jena/trunk/content/jena/documentation/tdb/tdb_transactions.html Fri Jan 13 13:11:55 2012 @@ -176,8 +176,6 @@ for reverting a database to 0.8.X.

  • Multi JVM
  • Bulk loading
  • -

    Status

    -

    Transactions are part of the 0.9.0 release.

    Overview

    The transaction mechanism in TDB is based on write-ahead-logging. @@ -225,10 +223,10 @@ it should be used transactionally after

    Read transactions

    These are used for SPARQL queries and code using the Jena API actions that do not change the data. The general pattern is:

    -
    dataset.begin(ReadWrite.READ) ;
    -try {
    -  ...
    -} finally { dataset.end() ; }
    +
     dataset.begin(ReadWrite.READ) ;
    + try {
    +   ...
    + } finally { dataset.end() ; }
     
    @@ -259,13 +257,13 @@ actions that do not change the data. Th actions that modify the data. Beware that large model.read operations consume large amounts of temporary space.

    The general pattern is:

    -
    dataset.begin(ReadWrite.WRITE) ;
    -try {
    -  ...
    -  dataset.commit() ;
    -} finally { 
    +
     dataset.begin(ReadWrite.WRITE) ;
    + try {
    +   ...
    +   dataset.commit() ;
    + } finally { 
        dataset.end() ; 
    -}
    + }