stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <>
Subject Re: Future of Clerezza and Stanbol
Date Mon, 12 Nov 2012 21:40:42 GMT
On 12/11/12 19:42, Reto Bachmann-Gmür wrote:
> On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <> wrote:
>> On 09/11/12 09:56, Rupert Westenthaler wrote:
>>> RDF libs:
>>> ====
>>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>>> Standards to evolve quite a bit in the coming years and I do have
>>> concern that the Clerezza RDF modules will be updated/extended to
>>> provide implementations of those. One example of such an situation is
>>> SPARQL 1.1 that is around for quite some time and is still not
>>> supported by Clerezza. While I do like the small API, the flexibility
>>> to use different TripleStores and that Clerezza comes with OSGI
>>> support I think given the current situation we would need to discuss
>>> all options and those do also include a switch to Apache Jena or
>>> Sesame. Especially Sesame would be an attractive option as their RDF
>>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>>> counterparts (Model [2] and Graph [3]) are considerable different and
>>> more complex interfaces. In addition Jena will only change to
>>> org.apache packages with the next major release so a switch before
>>> that release would mean two incompatible API changes.
>> Jena isn't changing the packaging as such -- what we've discussed is
>> providing a package for the current API and then a new, org.apache API.
>>   The new API may be much the same as the existing one or it may be
>> different - that depends on contributions made!
> I didn't know about jena planning to introduce such a common API.
>> I'd like to hear more about your experiences esp. with Graph API as that
>> is supposed to be quite simple - it's targeted at storage extensions as
>> well as supporting the richer Model API.  Personally, aside from the fact
>> that Clerreza enforces slot constraints (no literals as subjects), the Jena
>> Graph API and Clerezza RDF core API seem reasonably aligned.
> Yes the slot constraints comes from the RDF abstract syntax. In my opinion
> it's something one could decide to relax, by adding appropriate owl:sameAs
> bnode any graph could be transformed to an rdf-abstract-syntax compliant
> one. So maybe have a GnereicTripleCollection that can be converted to an
> RDFTRipleCollection - not sure. Just sticking to the spec and wait till
> this is allowed by the abstract syntax might be the easiest.

At the core, unconstrained slots has worked best for us.

Then either:

1/ have a test like:

2/ Layer an app API to impose the constraints (but it's easy to run out 
of good names).

The Graph/Node/Triple level in Jena is an API but it's primary role is 
the other side, to storage and inference, not apps.

Generality gives
A/ Future proofing (not perfect)
B/ Arises in inference and query naturally.
C/ using RDF structures for processing RDF

Nodes in triples can be variables, and I would have found it useful to 
have marker nodes to be able to build structures e.g. "known to be bound 
at this point in a query".  As it was, I ended up creating parallel 

> Where I see advantages of the clerezza API:
> - Bases on collections framework so standard tools can be used for graphs

Given a core system API, a scala and clojure and even different Java 
APIs for difefrent styles are all possible.

A universal API across systems is about plugging in machinery (parser, 
query engines, storage, inference).  It's good to separate that from 
application APIs otherwise there is a design tension.

> - Immutable graphs follow identity criterion of RDF semantics, this allows
> graph component to be added to sets and more straight forwardly implement
> diff and patch algorithms
> - BNode have no ids: apart from promoting the usage of URIs where this is
> appropriate it allows behind the scenes leanification and saves memory
> where the backend doesn't hast such ids.

We have argued about this before.

+ As you have objects, there is a concept of identity (you can tell two 
bNodes apart).
+ For persistence, an internal id is necessary to reconstruct 
consistently with caches.
+ Leaning isn't a core feature of RDF.  In fact, IIRC, mention is going 
to be removed.  It's information reduction, not data reduction.
+ There will be a have a skolemization Note from RDF-WG to deal with the 
practical matters of dealing with bNodes.

RDF as data model for linked data.

Its a datastructure with good properties for combining.  And it has links.

>> (for generalised systems such as rules engine - and for SPARQL - triples
>> can arise with extras like literals as subjects; they get removed later)
> If this shall be an API for interoperability based on RDF standard I'm
> wonder if is shall be possible to expose such intermediate constructs.

My suggestion is that the API for interoperability is designed to 
support RDF standards.

The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.

But also storage, SPARQL (Query and Update), and web access (e.g. conneg).

(and inference but it seems to me that inference have adopted more 
"individual" (data object), not triplem, styles)

> Reto


View raw message