incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Simons <m...@leosimons.com>
Subject [proposal] TripleSoup - a SPARQL endpoint for httpd
Date Mon, 29 Jan 2007 16:16:53 GMT
Hi all,

This is a proposal to start a rdf database server project at apache.

What do you think?

cheers!

- Leo

----
= summary =

TripleSoup is the simplest thing that you can do to turn your apache
web server into a SPARQL endpoint.

TripleSoup will be an RDF [2] store [3], tooling to work with that
database, and a REST [4] web interface to talk to that database using
SPARQL [5], implemented as an apache webserver module.

{{{
Target:    TLP
Sponsor:   Incubator PMC
Champion:  Leo Simons <leosimons@apache.org>
Mentors:   Dirk-Willem van Gulik <dirkx@apache.org>,
            Ben Hyde <bhyde@apache.org>,
            Stefano Mazzocchi <stefano@apache.org>,
            Leo Simons <leosimons@apache.org>
Resources: SVN:     https://svn.apache.org/repos/asf/incubator/ 
triplesoup/
            Website: http://incubator.apache.org/triplesoup/
            Jira:    http://issues.apache.org/jira/browse/TRIPLES
            Wiki:    http://wiki.apache.org/triplesoup/
            Mailing lists:
                     triplesoup-dev@incubator.apache.org
                     triplesoup-user@incubator.apache.org
                     triplesoup-commits@incubator.apache.org
                     triplesoup-private@incubator.apache.org
             Moderators: leosimons@apache.org
                         stefano@apache.org
                         dreid@apache.org
Initial committers:
            Dave Beckett <dave@dajobe.org>, redland author
            Dirk-Willem van Gulik <dirkx@apache.org>,
            Ben Hyde <bhyde@apache.org>,
            Stefano Mazzocchi <stefano@apache.org>,
            Andrea Marchesini <baku@theveniceproject.com>, b store  
author
            Alberto Reggiori <alberto@asemantics.com>, rdfstore author
            David Reid <dreid@apache.org>,
            Leo Simons <leosimons@apache.org>
Initial source:     mod_sparql, commercial triple store,
                     existing open source triple store
Known risks:        None
Technologies:       c
Reference:          http://wiki.apache.org/incubator/TripleSoupProposal
}}}

= Proposal details =

== Technology (basics) ==

What is RDF? It is just about any kind of data, represented as  
triples of
(subject, predicate, object), usually with a rich vocabulary  
describing the
semantics of the data (with the vocabulary typically also encoded as
triples).

This data has a representation as RDF/XML as well as using other  
formats such
as N3, and a query language SPARQL for searching through it. See [6]  
for an
overview.

So if it is just some data in some format, why does it need a special
server? Because RDF data is fundamentally not constrained to a  
"file", and
there often is no "resource identifier" that readily identifies  
something as a
"document" which can be served up over HTTP.

So why the REST interface? RDF is one of the building blocks proposed  
for the
"semantic web", and that's why a system that works well with/over  
HTTP is
needed from the start.

== Technology (concrete) ==

This is just an example. Imagine that there is an application  
"someapp" on
the host foo.example.com which provides access to information about  
books,
and you want to get a list of those books (their URIs) and the names  
of the
books.

{{{
$ telnet foo.example.com 80
SELECT /someapp HTTP/1.0
Host: foo.example.com
Query-Language: http://www.w3.org/TR/2006/CR-rdf-sparql-query-20060406/
Accept: application/sparql-results+xml, rdf/xml, rdf/n3

PREFIX books:   <http://example.org/book/>
PREFIX dc:      <http://purl.org/dc/elements/1.1/>
SELECT ?book ?title
WHERE
   { ?book dc:title ?title }

HTTP/1.0 200 Ok
Content-Type: application/sparql-results+xml
Content-Length: 1234

<?xml version="1.0"?>
<sparql
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xs="http://www.w3.org/2001/XMLSchema#"
     xmlns="http://www.w3.org/2005/sparql-results#">
   <head>
     <variable name="book"/>
     <variable name="title"/>
   </head>
   <results ordered="false" distinct="false">
     <result>
       <binding name="book">
         <uri>http://example.org/book/book6</uri>
       </binding>
       <binding name="title">
         <literal>Harry Potter and the Half-Blood Prince</literal>
       </binding>
     </result>
   </results>
</sparql>

Connection closed by foo.example.com
$
}}}

It turns out there's only one book in the database in this example.
(Sample data taken from http://www.sparql.org/). David Reid has some  
code that
does something not unlike this already [7], implemented as a httpd  
module,
using the Redland library [11,12] as its backend store.

== What would you use TripleSoup for? ==

* It could be a backend for piggy bank [8].

* It could be a backend for the next version of wikipedia.

* It could be a backend for an "open" version of iTunes or IMDB.

* It could be the backend for the information management system of the
Dutch ministry of water management [9].

* It could be the backend for projects.a.o [10] and similar  
applications.

* Most importantly, it could be a backend for dozens of useful new  
innovative
projects that no-one has envisioned yet.

== The initial source ==

RDFstore is a standalone RDF storage system implemented as a C  
library, licensed
under the ASL 1.1. It has perl bindings. Find its distribution at [15].

mod_sparql [7] is an in-development apache module that implements a  
SPARQL
endpoint. It is licensed under the Apache License 2.0. It uses  
redland as a
backend. The SVN repository can be found at [7].

B is an in-development storage backend for Redland implemented as a  
standalone
C library. It is currently a closed source codebase. A code snapshot  
can be
found at [16].

== The initial committers ==

Dirk-Willem, Ben, Stefano, David and Leo are ASF members who  
hopefully need no
introduction.

Dave Beckett is the primary author of the Redland RDF application  
framework.

Alberto Reggiori is the primary author of rdfstore, an rdf store  
developed by
asemantics [13], which will be contributed to TripleSoup. He is a  
partner at
asemantics.

Andrea Marchesini is the primary author of B, a storage backend for RDF
developed at Joost [14], which will be contributed to TripleSoup.

All initial committers have experience working on open source  
projects. They
work for at least 5 different companies.

== TripleSoup as an apache project ==

We think TripleSoup will have to reference dozens of specifications  
from the
W3C (XML, RDF, OWL, SPARQL, their standards for URIs, and more) and  
from the
IETF (HTTP, URL, URI, URN, and more), will make use of or integrate  
with quite
a few existing open source projects (like the redland RDF libraries  
as well as
apache apr&httpd). As such, it seems like TripleSoup should fit in  
really well
at apache.

The responses we got from various members of the RDF and semantic web
communities so far when discussing this proposal with them have all been
quite positive, and we expect and hope there'll be quite a few people
new to apache joining the project soon after it starts.

Most importantly, we think this project will be useful, innovative, and
fun!

= References =

{{{
[1] http://incubator.apache.org/
[2] http://www.w3.org/RDF/
[3] these are often called "triple stores"
[4] http://www.ics.uci.edu/~fielding/pubs/dissertation/ 
rest_arch_style.htm
[5] http://www.w3.org/TR/rdf-sparql-query/
[6] http://www.betaversion.org/~stefano/papers/ac2006.1.pdf
[7] http://david-reid.com/repos/public/mod_sparql/
[8] http://simile.mit.edu/wiki/Piggy_Bank
[9] http://www.wadi.nl/uk/
[10] http://projects.apache.org/
[11] http://www.librdf.net/
[12] http://svn.librdf.org/repository/
[13] http://www.asemantics.com/
[14] http://www.joost.com/
[15] http://rdfstore.sourceforge.net/downloads/RDFStore-0.51.tar.gz
[16] http://opensource.joost.com/libb/
}}}

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message