incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michele Mostarda <>
Subject Re: Splitting up Any23 into a more modular format
Date Fri, 11 May 2012 09:55:31 GMT
Hi Peter,

 I had a really quick look to your contribution, thanks for your effort.

 What I suggest is to provide your modifications as (possibly small)
patches (has you've already done).

On 11 May 2012 08:41, Peter Ansell <> wrote:

> Hi all,
> Over the past two days I have split up Any23 into a variety of modules
> to make it easier to use different parts of the Any23 API. You can see
> the code at [1]. The current module list in the parent pom reactor
> looks like:
>  <modules>
>    <module>api</module>
>    <module>csvutils</module>
>    <module>encoding</module>
>    <module>mime</module>
>    <module>core</module>
>    <module>test-resources</module>
>    <module>extractor</module>
>    <module>cli</module>
>    <module>test</module>
>    <module>service</module>
>    <module>plugins/basic-crawler</module>
>    <module>plugins/html-scraper</module>
>    <module>plugins/office-scraper</module>
>    <module>plugins/integration-test</module>
>    <module>sources-dist</module>
>  </modules>

The modularization refactoring at this stage introduces some complexity and
must be discussed
with the community in this mailing list, in particular with the Release
Manager (which has to deal with all these modules :) ).

> All of the modules above core do not have dependencies on core, and
> the core module only has a dependency on the api module.
> The api module mostly contains interfaces but it also contains factory
> registries where they are fully Service Provider Interface (SPI)
> driven (Any23PluginManager and WriterFactoryRegistry which I created
> to alleviate the WriterRegistry hardcoding dependencies and
> reflection/annotation code that isn't easy to extend outside of the
> core library). The ExtractoryRegistry was too difficult to convert to
> SPI just yet so I split it up into an interface and an implementation
> (ExtractorRegistryImpl) with the interface in the API module and used
> in some APIs where the singleton was previously used. These
> registries, together with Rio RDFFormat for referencing RDF format
> information, seemed to be enough to remove the hardcoding that I have
> been discussing at

That's really good.

> The changes fit my purposes as I can easily slot in the encoding and
> mime detection code without pulling in the core or extractor modules,
> and the supported types for the mime detection include any formats I
> register with OpenRDF Rio so it is extensible and modular for my
> purposes.
> However, most of the changes are too large for easy patching and I
> didn't arrange the changes into nice patches throughout as I was not
> sure what was going to happen in the end. I have submitted two very
> small patches to that issue, but there could be many more eventually
> if the redesigned code is acceptable.

I understand, but it is difficult to and time consuming for us to pull
from an external repository.

> Note, I also removed the Any23 NQuads implementation as it was missing
> Factory implementations for the writer and parser classes so it wasn't
> being picked up by Rio.createParser or any of the other static Rio
> methods. I replaced it with the NQuads implementation from Sesametools
> which includes these factories and so is recognised. When
> gets implemented both of
> these implementations will likely be deprecated anyway so it wasn't a
> major issue for me. I would suggest in either case splitting out the
> NQuads classes into a separate module and implementing a Factory for
> both the parser and writer so they are picked up by SPI.

That would be fine.

> There were some existing broken tests when I started, and there were a
> small number of tests that broke throughout, including one that broke
> when I updated to Tika-1.1. They are temporarily ignored, but can be
> found easily by checking the ignored tests when running the test
> suite.
This is bad, the MIMEType detection is really central for the use cases
covered by the Any23 main users.

I hope the changes are useful to others.

I think so, it would be nice to have you more involved within the group

> If you want to suggest changes to my version on GitHub feel free to
> open an issue or fork the repository and send a pull request back.


> Cheers,
> Peter

Thanks a lot!
The best.


> [1]

Michele Mostarda
Senior Software Engineer
skype: michele.mostarda
twitter: micmos
site :

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message