commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <a...@apache.org>
Subject Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF
Date Mon, 19 Jan 2015 15:15:25 GMT
On 17/01/15 12:00, Bruno P. Kinoshita wrote:
> Hi Andy!
>
>> Jena can (and does) support multiple APIs over a common core.
>>
>> A commons-rdf API can be added along side the existing APIs; that means
>> it is not a "big bang" to have commons-rdf interfaces supported.
>
> That's great! Would the commons-rdf dependency go in jena-core/pom.xml? Is it going to
be necessary to change some classes in the core? I think it will be transparent for other
modules like ARQ, Fuseki, Text. Is that right?

I don't think so - Jena's core is "generalized" RDF and this is important.

Just adding any new interfaces to the code Node (etc) objects isn't 
ideal: you get multiple method names for the same thing.  And the 
hashcode/equality contract to work across implementations (hashCode() of 
implementation A must be the same as hashCode() of implementation B when 
equality is the same ) is really quite tricky.

See also my comments about using classes not interfaces.

I personally do not see the worry about wrappers - for me the importance 
is the architectural difference of a presentation API, designed for 
applications to write code against, and systems API, designed to support 
the machinery.  Java is really rather good at optimizing away the cost 
of wrappers, including with multisite method dispatch optimizations and 
coping with dynamic loading code that changes assumptions at a later time.

So a new module that is "jena-commons-rdf" that provides an application 
presentation API woudl be the obvious route to me.  Fuseki etc

And this is only RDF, not Datasets or SPARQL.  We discussed that and 
fairly easily came to the conclusion that getting some common sooner was 
better than a complete set of APIs.  Some of the natural other ones are 
a lot more complicated - they would build on the terms provided by 
commons-rdf.

>> There is a lot more to working with RDF than the RDF API part - SPARQL
>> engines don't use that API if they want performance and/or scale. (1)
>> SPARQL queries collections of graphs and (2) for scale+persistence, you
>> need to work in parts at a level somewhat lower level than java objects,
>> and closer to the binary of persistence structures.
>
> Good point. I'm enjoying learning about Jena code for JENA-632. Even though datasets,
streaming queries collections and all that part about journaling and graph persistence can
be a bit scary.

:-)

Luckily, journalling and persistent is orthogonal to implementation 
JENA-632 though as a application feature mapped over the whole system, 
its a good way of seeing across several components.

> Probably that won't be covered in the commons-rdf, but I think that's correct.

I agree - there is a new world out here - a world of large memory 
machines, and quite likely, large scale persistent RAM in the not too 
distant future.  Given the longevity of shared APIs, it's very hard to 
find a balance across requirements and expectations.  The graph level is 
naturally driven by the specs but as soon as systems issues get thrown 
into the mix, the choice space is much larger.

	Andy

>
> Thanks!
> Bruno
>
>
> ----- Original Message -----
>> From: Andy Seaborne <andy@apache.org>
>> To: dev@commons.apache.org
>> Cc:
>> Sent: Saturday, January 17, 2015 7:40 AM
>> Subject: Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF
>>
>> On 15/01/15 11:52, Bruno P. Kinoshita wrote:
>>
>>>   Hello!
>>>
>>>
>>>   I feel like I can't help much in the current discussion. But just
>> wanted to chime in
>>>   and tell that I'm +1 for a [rdf] component in Apache Commons. As a
>> commons committer I'd
>>>   like to help.
>>>
>>>   I started watching the GitHub repository and have subscribed to the ongoing
>> discussion. I'll
>>>
>>>   tryto contribute in some way; maybe testing and with small patches.
>>>
>>>
>>>   My go-to Maven dependency for RDF, Turtle, N3, working with ontologies,
>> reasoners, etc,
>>>
>>>   is Apache Jena. I think it would be very positive to have a common
>> interface that I could
>>>   use in my code (mainly crawlers and data munging for Hadoop jobs) and that
>> would work
>>>
>>>   with different implementations.
>>>
>>>
>>>   Thanks!
>>>
>>>   Bruno
>>
>> Since you mention Jena ... :-)
>>
>> Jena can (and does) support multiple APIs over a common core.
>>
>> A commons-rdf API can be added along side the existing APIs; that means
>> it is not a "big bang" to have commons-rdf interfaces supported.
>>
>> There is a lot more to working with RDF than the RDF API part - SPARQL
>> engines don't use that API if they want performance and/or scale. (1)
>> SPARQL queries collections of graphs and (2) for scale+persistence, you
>> need to work in parts at a level somewhat lower level than java objects,
>> and closer to the binary of persistence structures.
>>
>>      Andy
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message