stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schaffert <sebastian.schaff...@salzburgresearch.at>
Subject Re: Toy-Usecase challenge for comparing RDF APIs to wrap data (was Re: Future of Clerezza and Stanbol)
Date Tue, 13 Nov 2012 10:52:16 GMT
Hi Reto,

I don't understand the use case, and I don't think it is well suited for comparing different
RDF APIs. 

If this is really an issue, I would suggest coming up with a bigger collection of RDF API
usage scenarios that are also relevant in practice (as proven by a software project using
it). Including scenarios how to deal with bigger amounts of data (i.e. beyond toy examples).
My scenarios typically include >= 100 million triples. ;-)

In addition to what Andy said about wrapper APIs, I would also like to emphasise the incurred
memory and computation overhead of wrapper APIs. Not an issue if you have only a handful of
triples, but a big issue when you have 100 million.

A possible way to bypass the wrapper issue is the approach followed by JDOM for XML, which
we tried to use also in LDPath: abstract away the whole data model and API using Java Generics.
This is typically very efficient (at runtime you are working with the native types), but it
is also complex and ugly (you end up with a big list of methods implementing delegation as
in http://code.google.com/p/ldpath/source/browse/ldpath-api/src/main/java/at/newmedialab/ldpath/api/backend/RDFBackend.java).

My favorite way would ba a common interface-based model for RDF in Java, implemented by different
backends. This would require the involvement of at least the Jena and the Sesame people. The
Sesame model already comes close to it, but of course also adds some concepts that are specific
to Sesame (e.g. the repository concept and the way contexts/named graphs are handled), as
we discussed some months ago.

Greetings,

Sebastian

Am 12.11.2012 um 20:45 schrieb Reto Bachmann-Gmür:

> May I suggest the following toy-usecase for comparing different API
> proposals (we know all API can be used for triple stores, so it seems
> interesting how the can be used to expose any data as RDF and the Space
> complexity of such an adapter):
> 
> Given
> 
> interface Person() {
> String getGivenName();
> String getLastName();
> /**
> * @return true if other is an instance of Person with the same GivenName
> and LastName, false otherwise
> */
> boolean equals(Object other);
> }
> 
> Provide a method
> 
> Graph getAsGraph(Set<Person> pesons);
> 
> where `Graph` is the API of an RDF Graph that can change over time. The
> returned `Graph`shall (if possible) be backed by the Set passed as argument
> and thus reflect future changes to that set. The Graph shall support all
> read operation but no addition or removal of triples. It's ok is some
> iteration over the graph result in a ConcurrentModficationException if the
> set changes during iteration (as one would get when iterating over the set
> during such a modification).
> 
> - How does the code look like?
> - Is it backed by the Set and does the result Graph reflects changes to the
> set?
> - What's the space complexity?
> 
> Challenge accepted?
> 
> Reto
> 
> On Mon, Nov 12, 2012 at 6:11 PM, Andy Seaborne <andy@apache.org> wrote:
> 
>> On 11/11/12 23:22, Rupert Westenthaler wrote:
>> 
>>> Hi all ,
>>> 
>>> On Sun, Nov 11, 2012 at 4:47 PM, Reto Bachmann-Gmür <reto@apache.org>
>>> wrote:
>>> 
>>>> - clerezza.rdf graudates as commons.rdf: a modular java/scala
>>>> implementation of rdf related APIs, usable with and without OSGi
>>>> 
>>> 
>>> For me this immediately raises the question: Why should the Clerezza
>>> API become commons.rdf if 90+% (just a guess) of the Java RDF stuff is
>>> based on Jena and Sesame? Creating an Apache commons project based on
>>> an RDF API that is only used by a very low percentage of all Java RDF
>>> applications is not feasible. Generally I see not much room for a
>>> commons RDF project as long as there is not a commonly agreed RDF API
>>> for Java.
>>> 
>> 
>> Very good point.
>> 
>> There is a finite and bounded supply of energy of people to work on such a
>> thing and to make it work for the communities that use it.   For all of us,
>> work on A means less work on B.
>> 
>> 
>> An "RDF API" for applications needs to be more than RDF. A SPARQL engine
>> is not simply abstracted from the storage by some "list(s,p,o)" API call.
>> It will die at scale, where scale here includes in-memory usage.
>> 
>> My personal opinion is that wrapper APIs are not the way to go - they end
>> up as a new API in themselves and the fact they are backed by different
>> systems is really an implementation detail.  They end up having design
>> opinions and gradually require more and more maintenace as the add more and
>> more.
>> 
>> API bridges are better (mapping one API to another - we are really talking
>> about a small number of APIs, not 10s) as they expose the advantages of
>> each system.
>> 
>> The ideal is a set of interfaces systems can agree on.  I'm going to be
>> contributing to the interfacization of the Graph API in Jena - if you have
>> thoughts, send email to a list.
>> 
>>        Andy
>> 
>> PS See the work being done by Stephen Allen on coarse grained APIs:
>> 
>> http://mail-archives.apache.**org/mod_mbox/jena-dev/201206.**
>> mbox/%3CCAPTxtVOMMWxfk2%**2B4ciCExUBZyxsDKvuO0QshXF8uKha**
>> D8txXjA%40mail.gmail.com%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201206.mbox/%3CCAPTxtVOMMWxfk2%2B4ciCExUBZyxsDKvuO0QshXF8uKhaD8txXjA%40mail.gmail.com%3E>
>> 
>> 
>> 

Sebastian
-- 
| Dr. Sebastian Schaffert          sebastian.schaffert@salzburgresearch.at
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg


Mime
View raw message