incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Ansell <ansell.pe...@gmail.com>
Subject Split out modules for mime and nquads
Date Thu, 30 Aug 2012 02:10:26 GMT
Hi all,

I have created a set of 3 related patches to split out the mime and
nquads functionality into separate modules that do not depend on the
core module. This will make it easier for people to reuse both of
these modules. The main goal for the patch was to split out the mime
module so that it can be used without pulling in everything from core,
but that requires the nquads module to be split out at the same time
to enable tests to be moved from core/src/test/java to
mime/src/test/java.

Unfortunately there is a single class, CSVReaderBuilder, that doesn't
seem to fit in the api module, and is required by the mime type
detector to detect csv documents. I created a separate module called
csvutils that has dependencies on apache commons csv and the api
module. It needs to api module to pull in a DefaultConfiguration
object.

In addition, I removed the compile time dependency for the core module
on the nquads module, so that it can be interchangeable with a Sesame
Rio N-Quads module when the CLAs to Aduna/Vound are completed and I
get time to implement it. I switched hard-dependencies to most of the
Rio modules off where possible, however there were still two cases
where there are hard dependencies. Firstly, the N-Quads implementation
has a compile time dependency on the Sesame Rio NTriples classes. This
does not cause a hard-dependency on ntriples, as the nquads module
itself is a runtime dependency for core and mime. The only
hard-dependency on a parser module for core is through a custom
TurtleParser extension class that is used in RDFParserFactory to set
the base prefix to the baseURI when parsing, and it is not apparent
how that could be fixed, as there is no "on parse start" hook for
RDFParser. These changes resulted in a large number of minor changes
to standardise references to Rio RDFParser using Rio.createParser
whereever possible. These include some references to RDFParserBase
which is an implementation class and not in the basic OpenRDF Rio API.

The changes also matche the Any23 preferred mime type for N-Quads with
the mime type given in the initial specification, ie, "text/x-nquads".
The alternative mime types are still supported, but the Tika
configuration now returns text/x-nquads if it is given one of the
aliases. In addition, the patch also switches the tika turtle mime
type to the type contained in the W3C Team Submission, "text/turtle".
As with N-Quads, the Tika configuration for Turtle still contains the
alternative mime type "application/x-turtle", but should now return
"text/turtle" instead of the alias. There were a large number of
places throughout the Any23 codebase where these mime types were
hardcoded, so to reduce that number to make things manageable I
switched to using RDFFormat.getDefaultMimeType(), and
RDFFormat.hasMimeType, which checks against both the default and any
alternative mime types defined in the RDFFormat that is being
referenced.

One other change that may affect some operations is the switch in
NQuadsParser from using the default user locale to define the charset
for InputStream's to explicitly use "UTF-8", which may have been what
was desired in the past anyway. I also switched the order of parsing
in NQuadsParser to avoid using importing the custom Any23
ReaderInputStream class by instead using the standard Java
InputStreamReader class to focus the parse process on Reader instead
of InputStream.

You can review the patches at:

https://github.com/ansell/any23/compare/ansell:trunk...ansell:mime-module

Either commenting inline at GitHub or here on the mailing list is fine with me.

The patches relate to three ANY23 Jira Issues:

* ANY23-85 : Splitting out the NQuads parser and writer into its own module
* ANY23-117 : Split out mime type detection into its own module
* ANY23-83 : Removing hardcoded formats to make Any23 more flexible as
a modular library

Although the branch contains three independent patches, I did not
create them initially that way, so they may contain bugs if you test
them individually. In particular, there are references to csvutils and
mime modules in the nquads patch. If necessary I could further
refactor them, but if all three are okay I will submit them all at the
same time.

Cheers,

Peter

Mime
View raw message