incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Ansell (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ANY23-19) Abstract away any specific RDF APIs
Date Fri, 13 Apr 2012 06:36:43 GMT

    [ https://issues.apache.org/jira/browse/ANY23-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253175#comment-13253175
] 

Peter Ansell commented on ANY23-19:
-----------------------------------

Hi Paolo,

The example library that is continually being referred to here, java-rdfa, abstracts away
from clerezza, sesame and jena interfaces by representing everything as strings. See https://github.com/shellac/java-rdfa/blob/master/core/src/main/java/net/rootdev/javardfa/StatementSink.java
The java-rdfa library is also not being developed or used actively, so it isn't the best example
of a successful library that doesn't use a type-safe RDF Statement/Value API. I worked on
java-rdfa a little but there is no push behind it. Even its author refers to it as "The cruftiest
RDFa parser in the world"

Two of the three goals of Any23, command line utility, and web service, are completely ambivalent
to the technology being used internally, as long as it is high quality, and the Sesame libraries
are very high quality in my opinion and experience. The only goal that would be affected would
be the use of Any23 as a library. How often is Any23 currently being used as a library? Could
another project easily implement the same functionality using Jena with less effort than it
would take to either create a custom string based solution, or move to another framework that
may have just as many or more dependencies?

In terms of the use of Any23 as a library, is there anything about the Sesame Model hierarchy
(Value/Resource/Literal/BNode/URI) that would be better represented using a custom solution?
As one example, I have been working with OWLAPI recently and its RDF handling is shocking,
it merges URIs with Blank Nodes to form what it refers to as IRIs. It has a custom internal
solution that only recognises two types of triples, those with an IRI in the object position
(where IRI is not type-safely defined between URI and BlankNode) and those with Literals in
the object position. I can't imagine Any23 going down this route, but it is the worst case
scenario if the API is converted without a reason. In the simplest scenario, it may be possible
to reuse the Sesame Model hierarchy to produce Values that work across all three libraries,
using custom ValueImpl etc., implementations that actually implement the relevant interfaces
from other libraries, along a custom ValueFactory to produce these multi-library-compatible
Values (custom ValueFactories can be plugged into any Rio Parser using Rio.getParser(RDFFormat,
ValueFactory), a functionality which I haven't seen in other libraries).

In terms of the actual packages that are currently used, there are four basic packages sesame-model,
sesame-rio-api, sesame-repository-api, sesame-sail-api, sesame-sail-memory. These base libraries
are small dependencies. One other dependency is some small utilities, sesame-util that are
used by sesame-model and other sesame libraries.

82K - sesame-model-2.6.5.jar
36K - sesame-repository-api-2.6.5.jar
22K - sesame-rio-api-2.6.5.jar
56K - sesame-sail-api-2.6.5.jar
54K - sesame-sail-memory-2.6.5.jar
53K - sesame-util-2.6.5.jar

The value Impl classes should not be directly referenced. They should be accessed using a
ValueFactory and used as their Interfaces. This doesn't change any of the libraries that are
used, but it is better practice.

The other libraries that pull in the Rio parsers can be linked in dynamically without compiling
in the dependency, so the use of Any23 as a library would enable people to pull them in as
needed. See Rio.getParser(RDFFormat, ValueFactory) and Rio.getWriter(RDFFormat) methods. It
would be valuable if Any23 could dynamically pull in all of its parsers and writers using
the Rio.* static methods. Then it could be used with the absolute minimum number of parsers
and writers for the current user. 

4.6K - sesame-rio-n3-2.6.5.jar
14K - sesame-rio-ntriples-2.6.5.jar
33K - sesame-rio-rdfxml-2.6.5.jar
17K - sesame-rio-turtle-2.6.5.jar

Switching to another library may cause the bloat that you say you do not want.

For example, Jena and its immediate dependencies is quite large compared to the modular sesame
jar files, and that doesn't include the SPARQL parsing libraries from ARQ, as indeed the sesame
libraries quote above do not include the sparql libraries.

1.7M - jena-core-2.7.0-incubating.jar
151K - jena-iri-0.9.0-incubating.jar
3.1M - icu4j-3.4.4.jar
1.4M - xercesImpl-2.10.0.jar

                
> Abstract away any specific RDF APIs
> -----------------------------------
>
>                 Key: ANY23-19
>                 URL: https://issues.apache.org/jira/browse/ANY23-19
>             Project: Apache Any23
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Paolo Castagna
>             Fix For: 0.8.0
>
>
> Any23 currently uses Sesame to work with or parse RDF. Specifically Any23 uses these
classes from org.openrdf.* packages:
> org.openrdf.model.BNode
> org.openrdf.model.datatypes.XMLDatatypeUtil
> org.openrdf.model.impl.LiteralImpl
> org.openrdf.model.impl.URIImpl
> org.openrdf.model.impl.ValueFactoryImpl
> org.openrdf.model.Literal
> org.openrdf.model.Resource
> org.openrdf.model.Statement
> org.openrdf.model.URI
> org.openrdf.model.Value
> org.openrdf.model.ValueFactory
> org.openrdf.model.vocabulary.OWL
> org.openrdf.model.vocabulary.RDF
> org.openrdf.model.vocabulary.RDFS
> org.openrdf.model.vocabulary.XMLSchema
> org.openrdf.repository.RepositoryConnection
> org.openrdf.repository.RepositoryException
> org.openrdf.repository.RepositoryResult
> org.openrdf.repository.sail.SailRepository
> org.openrdf.rio.helpers.RDFParserBase
> org.openrdf.rio.ntriples.NTriplesParser
> org.openrdf.rio.ntriples.NTriplesUtil
> org.openrdf.rio.ntriples.NTriplesWriter
> org.openrdf.rio.ParseErrorListener
> org.openrdf.rio.ParseLocationListener
> org.openrdf.rio.RDFFormat
> org.openrdf.rio.RDFHandler
> org.openrdf.rio.RDFHandlerException
> org.openrdf.rio.RDFParseException
> org.openrdf.rio.RDFParser
> org.openrdf.rio.rdfxml.RDFXMLParser
> org.openrdf.rio.rdfxml.RDFXMLWriter
> org.openrdf.rio.turtle.TurtleWriter
> org.openrdf.sail.memory.MemoryStore
> org.openrdf.sail.Sail
> org.openrdf.sail.SailException
> Would it be possible to abstract away any specific RDF APIs to allow Any23 users to chose
between, say: Apache Clerezza [1], Apache Jena [2], Sesame [3] and/or others?
> An example of small RDF distiller which does this is java-rdfa [4]. Maybe a similar agnostic
(but easy to integrate) approach is possible for Any23. Although, java-rdfa does not need
to parse RDF content itself. 
>  [1] http://incubator.apache.org/clerezza/
>  [2] http://incubator.apache.org/jena/
>  [3] http://www.openrdf.org/
>  [4] https://github.com/shellac/java-rdfa

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message