incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <>
Subject Re: Splitting up Any23 into a more modular format
Date Fri, 11 May 2012 09:21:01 GMT
Hi Peter,

As I said on the issue at [1], this looks like exciting work. I'm
hoping that is sparks some conversation amongst us.

As we are pushing for our first incubating release I'm not entirely
sure that the restructuring is a viable option just now, however we
should certainly not rule it out unless there is a justified argument.

Thank you for the heads up on this.


On Fri, May 11, 2012 at 7:41 AM, Peter Ansell <> wrote:
> Hi all,
> Over the past two days I have split up Any23 into a variety of modules
> to make it easier to use different parts of the Any23 API. You can see
> the code at [1]. The current module list in the parent pom reactor
> looks like:
>  <modules>
>    <module>api</module>
>    <module>csvutils</module>
>    <module>encoding</module>
>    <module>mime</module>
>    <module>core</module>
>    <module>test-resources</module>
>    <module>extractor</module>
>    <module>cli</module>
>    <module>test</module>
>    <module>service</module>
>    <module>plugins/basic-crawler</module>
>    <module>plugins/html-scraper</module>
>    <module>plugins/office-scraper</module>
>    <module>plugins/integration-test</module>
>    <module>sources-dist</module>
>  </modules>
> All of the modules above core do not have dependencies on core, and
> the core module only has a dependency on the api module.
> The api module mostly contains interfaces but it also contains factory
> registries where they are fully Service Provider Interface (SPI)
> driven (Any23PluginManager and WriterFactoryRegistry which I created
> to alleviate the WriterRegistry hardcoding dependencies and
> reflection/annotation code that isn't easy to extend outside of the
> core library). The ExtractoryRegistry was too difficult to convert to
> SPI just yet so I split it up into an interface and an implementation
> (ExtractorRegistryImpl) with the interface in the API module and used
> in some APIs where the singleton was previously used. These
> registries, together with Rio RDFFormat for referencing RDF format
> information, seemed to be enough to remove the hardcoding that I have
> been discussing at
> The changes fit my purposes as I can easily slot in the encoding and
> mime detection code without pulling in the core or extractor modules,
> and the supported types for the mime detection include any formats I
> register with OpenRDF Rio so it is extensible and modular for my
> purposes.
> However, most of the changes are too large for easy patching and I
> didn't arrange the changes into nice patches throughout as I was not
> sure what was going to happen in the end. I have submitted two very
> small patches to that issue, but there could be many more eventually
> if the redesigned code is acceptable.
> Note, I also removed the Any23 NQuads implementation as it was missing
> Factory implementations for the writer and parser classes so it wasn't
> being picked up by Rio.createParser or any of the other static Rio
> methods. I replaced it with the NQuads implementation from Sesametools
> which includes these factories and so is recognised. When
> gets implemented both of
> these implementations will likely be deprecated anyway so it wasn't a
> major issue for me. I would suggest in either case splitting out the
> NQuads classes into a separate module and implementing a Factory for
> both the parser and writer so they are picked up by SPI.
> There were some existing broken tests when I started, and there were a
> small number of tests that broke throughout, including one that broke
> when I updated to Tika-1.1. They are temporarily ignored, but can be
> found easily by checking the ignored tests when running the test
> suite.
> I hope the changes are useful to others.
> If you want to suggest changes to my version on GitHub feel free to
> open an issue or fork the repository and send a pull request back.
> Cheers,
> Peter
> [1]


View raw message