incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Westenthaler <rupert.westentha...@gmail.com>
Subject Re: Future of Clerezza and Stanbol
Date Tue, 13 Nov 2012 12:13:00 GMT
Hi all,

I would like to share some thoughts/comments and suggestions from my side:

ResourceFactory: Clerezza is missing a Factory for RDF resources. I
would like to have such a Factory. The Factory should be obtainable
via the Graph - the Collection of Triples. IMO such a Factory is
required if all resource types (IRI, Bnode, Literal) are represented
by interfaces.

BNodes: If Bnode is an interface than any implementation is free to
internally use a "bnode-id". One argument pro such ids (that was not
yet mentioned) is that such id's allow you to avoid in-memory mappings
for bnodes when wrapping an native implementation. In Clerezza you
currently need to have this Bidi maps.

Triple, Quads: While for some use cases the Triple-in-Graph based API
(Quad := Triple t =
TripleStore#getGraph(context).filter(subject,predicate,object)) is
sufficient this is no longer the case as soon as Applications want to
work with an Graph that contains Quads with several contexts. So I
would vote for having support for Quads.

Dataset,Graph: Out of an User perspective Dataset (how the TripleStore
looks at the Triples) and Graph (how RDF looks at the Triples) are not
so different. Because of that I would like to have a single domain
object fitting for both. The API should focus on the Graph aspects (as
Clerezza does) while still allowing efficient implementations that do
not load all triples into memory (e.g. use closeable iterators)

Immutable Graphs: I had really problems to get this right and the
current Clerezza API does not help with that task (resulting in things
like read-only mutable graphs that are no Graphs as they only provide
a read-only view on a Graph that might still be changed by other
means). I think read-only Graphs (like
Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
use case to protect a returned graph from modifications by the caller
of the method is much more prominent as truly immutable graphs.

SPARQL: I would not deal with parsing SPARQL queries but rather
forward them as is to the underlaying implementation. If doing so the
API would only need to border with result sets. This would also avoid
the need to deal with "Datasets". This is not arguing against a
fallback (e.g. the trick Clerezza does by using the Jena SPARQL
implementation) but in practice efficient SPARQL executions can only
happen natively within the TripleStore. Trying to do otherwise will
only trick users into use cases that will not scale.

best
Rupert

On Tue, Nov 13, 2012 at 9:08 AM, Reto Bachmann-Gmür <reto@wymiwyg.com> wrote:
> On Mon, Nov 12, 2012 at 10:40 PM, Andy Seaborne <andy@apache.org> wrote:
>
>> On 12/11/12 19:42, Reto Bachmann-Gmür wrote:
>>
>>> On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <andy@apache.org> wrote:
>>>
>>>  On 09/11/12 09:56, Rupert Westenthaler wrote:
>>>>
>>>>  RDF libs:
>>>>> ====
>>>>>
>>>>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>>>>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>>>>> Standards to evolve quite a bit in the coming years and I do have
>>>>> concern that the Clerezza RDF modules will be updated/extended to
>>>>> provide implementations of those. One example of such an situation is
>>>>> SPARQL 1.1 that is around for quite some time and is still not
>>>>> supported by Clerezza. While I do like the small API, the flexibility
>>>>> to use different TripleStores and that Clerezza comes with OSGI
>>>>> support I think given the current situation we would need to discuss
>>>>> all options and those do also include a switch to Apache Jena or
>>>>> Sesame. Especially Sesame would be an attractive option as their RDF
>>>>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>>>>> counterparts (Model [2] and Graph [3]) are considerable different and
>>>>> more complex interfaces. In addition Jena will only change to
>>>>> org.apache packages with the next major release so a switch before
>>>>> that release would mean two incompatible API changes.
>>>>>
>>>>>
>>>> Jena isn't changing the packaging as such -- what we've discussed is
>>>> providing a package for the current API and then a new, org.apache API.
>>>>   The new API may be much the same as the existing one or it may be
>>>> different - that depends on contributions made!
>>>>
>>>>
>>> I didn't know about jena planning to introduce such a common API.
>>>
>>>
>>>> I'd like to hear more about your experiences esp. with Graph API as that
>>>> is supposed to be quite simple - it's targeted at storage extensions as
>>>> well as supporting the richer Model API.  Personally, aside from the fact
>>>> that Clerreza enforces slot constraints (no literals as subjects), the
>>>> Jena
>>>> Graph API and Clerezza RDF core API seem reasonably aligned.
>>>>
>>>>
>>> Yes the slot constraints comes from the RDF abstract syntax. In my opinion
>>> it's something one could decide to relax, by adding appropriate owl:sameAs
>>> bnode any graph could be transformed to an rdf-abstract-syntax compliant
>>> one. So maybe have a GnereicTripleCollection that can be converted to an
>>> RDFTRipleCollection - not sure. Just sticking to the spec and wait till
>>> this is allowed by the abstract syntax might be the easiest.
>>>
>>
>> At the core, unconstrained slots has worked best for us.
>>
>
> The question is shall this be part of a common API. For machinering doing
> inference and dealing with the meaning of RDF graphs resources should also
> be associated to a set of IRIs (that serialize into oswl:sameAs).
>
>
>>
>> Then either:
>>
>> 1/ have a test like:
>>   Triple.isValidRDF
>>
>> 2/ Layer an app API to impose the constraints (but it's easy to run out of
>> good names).
>>
>
> The clerezza API would be such a layer.
>
>
>>
>>
>> The Graph/Node/Triple level in Jena is an API but it's primary role is the
>> other side, to storage and inference, not apps.
>>
>> Generality gives
>> A/ Future proofing (not perfect)
>> B/ Arises in inference and query naturally.
>> C/ using RDF structures for processing RDF
>>
>> Nodes in triples can be variables, and I would have found it useful to
>> have marker nodes to be able to build structures e.g. "known to be bound at
>> this point in a query".  As it was, I ended up creating parallel structures.
>>
>>
>>  Where I see advantages of the clerezza API:
>>> - Bases on collections framework so standard tools can be used for graphs
>>>
>>
>> Given a core system API, a scala and clojure and even different Java APIs
>> for difefrent styles are all possible.
>>
>
> Right. That's why I propose having a minimum API and decorators as to
> provide scala interfacing or the resource api for java ( which corresponds
> more or less to the W3C RDF API draft)
>
>
>>
>> A universal API across systems is about plugging in machinery (parser,
>> query engines, storage, inference).  It's good to separate that from
>> application APIs otherwise there is a design tension.
>
> I'm wondering if there need to be specia hooks for inference or if this
> cannot just as well be done by simply wrapping the graphs.
>
>
>>
>>
>>  - Immutable graphs follow identity criterion of RDF semantics, this allows
>>> graph component to be added to sets and more straight forwardly implement
>>> diff and patch algorithms
>>> - BNode have no ids: apart from promoting the usage of URIs where this is
>>> appropriate it allows behind the scenes leanification and saves memory
>>> where the backend doesn't hast such ids.
>>>
>>
>> We have argued about this before.
>>
>> + As you have objects, there is a concept of identity (you can tell two
>> bNodes apart).
>>
> No, two bnodes might be indistinguisgibe as in
>
> a :knows b
> b : knows a
>
> You cannot tell them apart even though none of them can be leanified away
>
>
>> + For persistence, an internal id is necessary to reconstruct consistently
>> with caches.
>>
>
> Here we are talking about some implementation stuff that imho should be
> separate from API discussion. Do you accept my Toy-usecase challenge [1],
> if we leave the classical dedicate triple store usecase scenario the id
> quickly becomes something that makes things harder rather than easier.
>
>
>> + Leaning isn't a core feature of RDF.  In fact, IIRC, mention is going to
>> be removed.  It's information reduction, not data reduction.
>>
>
> It simply arises from bnodes being existential variables. If they are
> eredined to be something else then I have difficulties to see what
> advantages they wold still offer to named nodes (maybe in some slolem: uri
> scheme)
>
>
>> + There will be a have a skolemization Note from RDF-WG to deal with the
>> practical matters of dealing with bNodes.
>>
>> RDF as data model for linked data.
>>
>> Its a datastructure with good properties for combining.  And it has links.
>>
>>
>>
>>>
>>>
>>>
>>>> (for generalised systems such as rules engine - and for SPARQL - triples
>>>> can arise with extras like literals as subjects; they get removed later)
>>>>
>>>
>>>
>>> If this shall be an API for interoperability based on RDF standard I'm
>>> wonder if is shall be possible to expose such intermediate constructs.
>>>
>>
>> My suggestion is that the API for interoperability is designed to support
>> RDF standards.
>>
>> The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.
>>
>
> Datasets are an element of the relevant sparql spec, I don't see Quads.
>
>
>>
>> But also storage, SPARQL (Query and Update), and web access (e.g. conneg).
>>
>
> Clerezza is very stong on conneg but I don't think this would be part of
> the rdf core api, but rather of the parts that could be part of Stanbol and
> provide a Linked Data Platform Container (LDPC).
>
>
> Reto
>
> 1.
> http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-%3DXkwbLAJHBcboE963hDxv6g0jHNPj6cxPQ%40mail.gmail.com%3E



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Mime
View raw message