incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reto Bachmann-Gmür <r...@wymiwyg.com>
Subject Re: Toy-Usecase challenge for comparing RDF APIs to wrap data (was Re: Future of Clerezza and Stanbol)
Date Tue, 13 Nov 2012 11:39:33 GMT
Hi Sebastian,

On Tue, Nov 13, 2012 at 11:52 AM, Sebastian Schaffert <
sebastian.schaffert@salzburgresearch.at> wrote:

> Hi Reto,
>
> I don't understand the use case, and I don't think it is well suited for
> comparing different RDF APIs.
>

Isn't that a slight contradiction? ;)

Understanding: you have a set of contact objects, we don't care were they
come fro or how many they are, we just have some contacts. Now we would
like to deal with them as an RDF datasource.

Well Suitedness: An RDF application typically doesn't have the priviledge
to have only graphs as inputs. It will have to deal with `Contact`S,
`StockQuote`S and `WeatherForecasts`S having a wrapper on these objects
that makes them RDF graphs is the first step to then allow processing with
the generic RDF tools and e.g. merging with other RDF data .


>
> If this is really an issue, I would suggest coming up with a bigger
> collection of RDF API usage scenarios that are also relevant in practice
> (as proven by a software project using it). Including scenarios how to deal
> with bigger amounts of data (i.e. beyond toy examples). My scenarios
> typically include >= 100 million triples. ;-)
>
> In addition to what Andy said about wrapper APIs, I would also like to
> emphasise the incurred memory and computation overhead of wrapper APIs. Not
> an issue if you have only a handful of triples, but a big issue when you
> have 100 million.
>

It's a common misconception to think that java sets are limited to 231-1
elements, but even that would be more than 100 millions. In the challenge I
didn't ask for time complexity, it would be fair to ask for that too if you
want to analyze scenarios with such big number of triples.


> A possible way to bypass the wrapper issue is the approach followed by
> JDOM for XML, which we tried to use also in LDPath: abstract away the whole
> data model and API using Java Generics. This is typically very efficient
> (at runtime you are working with the native types), but it is also complex
> and ugly (you end up with a big list of methods implementing delegation as
> in
> http://code.google.com/p/ldpath/source/browse/ldpath-api/src/main/java/at/newmedialab/ldpath/api/backend/RDFBackend.java
> ).
>
I think this only supported accessing graphs an not creation of grah
objects, so I'm afraid you can't take the challenge with that one.



>
> My favorite way would ba a common interface-based model for RDF in Java,
> implemented by different backends. This would require the involvement of at
> least the Jena and the Sesame people. The Sesame model already comes close
> to it, but of course also adds some concepts that are specific to Sesame
> (e.g. the repository concept and the way contexts/named graphs are
> handled), as we discussed some months ago.
>

Yes, that was the thread:
http://mail-archives.apache.org/mod_mbox/incubator-stanbol-dev/201208.mbox/%3CCAMmeZRmQcQP1syT=ccDG=fSXHOQA4OcAvcrBkHTXritiwT353A@mail.gmail.com%3E

I think such an interface based common API is the goal, Let's compare the
approaches we have. Le's create different usecase to see how the existing
APIs compared, the challenge I posed is just a start.

Cheers,
Reto

>
> Greetings,
>
> Sebastian
>
> Am 12.11.2012 um 20:45 schrieb Reto Bachmann-Gmür:
>
> > May I suggest the following toy-usecase for comparing different API
> > proposals (we know all API can be used for triple stores, so it seems
> > interesting how the can be used to expose any data as RDF and the Space
> > complexity of such an adapter):
> >
> > Given
> >
> > interface Person() {
> > String getGivenName();
> > String getLastName();
> > /**
> > * @return true if other is an instance of Person with the same GivenName
> > and LastName, false otherwise
> > */
> > boolean equals(Object other);
> > }
> >
> > Provide a method
> >
> > Graph getAsGraph(Set<Person> pesons);
> >
> > where `Graph` is the API of an RDF Graph that can change over time. The
> > returned `Graph`shall (if possible) be backed by the Set passed as
> argument
> > and thus reflect future changes to that set. The Graph shall support all
> > read operation but no addition or removal of triples. It's ok is some
> > iteration over the graph result in a ConcurrentModficationException if
> the
> > set changes during iteration (as one would get when iterating over the
> set
> > during such a modification).
> >
> > - How does the code look like?
> > - Is it backed by the Set and does the result Graph reflects changes to
> the
> > set?
> > - What's the space complexity?
> >
> > Challenge accepted?
> >
> > Reto
> >
> > On Mon, Nov 12, 2012 at 6:11 PM, Andy Seaborne <andy@apache.org> wrote:
> >
> >> On 11/11/12 23:22, Rupert Westenthaler wrote:
> >>
> >>> Hi all ,
> >>>
> >>> On Sun, Nov 11, 2012 at 4:47 PM, Reto Bachmann-Gmür <reto@apache.org>
> >>> wrote:
> >>>
> >>>> - clerezza.rdf graudates as commons.rdf: a modular java/scala
> >>>> implementation of rdf related APIs, usable with and without OSGi
> >>>>
> >>>
> >>> For me this immediately raises the question: Why should the Clerezza
> >>> API become commons.rdf if 90+% (just a guess) of the Java RDF stuff is
> >>> based on Jena and Sesame? Creating an Apache commons project based on
> >>> an RDF API that is only used by a very low percentage of all Java RDF
> >>> applications is not feasible. Generally I see not much room for a
> >>> commons RDF project as long as there is not a commonly agreed RDF API
> >>> for Java.
> >>>
> >>
> >> Very good point.
> >>
> >> There is a finite and bounded supply of energy of people to work on
> such a
> >> thing and to make it work for the communities that use it.   For all of
> us,
> >> work on A means less work on B.
> >>
> >>
> >> An "RDF API" for applications needs to be more than RDF. A SPARQL engine
> >> is not simply abstracted from the storage by some "list(s,p,o)" API
> call.
> >> It will die at scale, where scale here includes in-memory usage.
> >>
> >> My personal opinion is that wrapper APIs are not the way to go - they
> end
> >> up as a new API in themselves and the fact they are backed by different
> >> systems is really an implementation detail.  They end up having design
> >> opinions and gradually require more and more maintenace as the add more
> and
> >> more.
> >>
> >> API bridges are better (mapping one API to another - we are really
> talking
> >> about a small number of APIs, not 10s) as they expose the advantages of
> >> each system.
> >>
> >> The ideal is a set of interfaces systems can agree on.  I'm going to be
> >> contributing to the interfacization of the Graph API in Jena - if you
> have
> >> thoughts, send email to a list.
> >>
> >>        Andy
> >>
> >> PS See the work being done by Stephen Allen on coarse grained APIs:
> >>
> >> http://mail-archives.apache.**org/mod_mbox/jena-dev/201206.**
> >> mbox/%3CCAPTxtVOMMWxfk2%**2B4ciCExUBZyxsDKvuO0QshXF8uKha**
> >> D8txXjA%40mail.gmail.com%3E<
> http://mail-archives.apache.org/mod_mbox/jena-dev/201206.mbox/%3CCAPTxtVOMMWxfk2%2B4ciCExUBZyxsDKvuO0QshXF8uKhaD8txXjA%40mail.gmail.com%3E
> >
> >>
> >>
> >>
>
> Sebastian
> --
> | Dr. Sebastian Schaffert          sebastian.schaffert@salzburgresearch.at
> | Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
> | Head of Knowledge and Media Technologies Group          +43 662 2288 423
> | Jakob-Haringer Strasse 5/II
> | A-5020 Salzburg
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message