incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <>
Subject Upgrade to Tika 1.2 [WAS] Re: [ANNOUNCE] Welcome Peter Ansell as Any23 PPMC member and committer
Date Tue, 07 Aug 2012 11:34:17 GMT
Hi Peter,

Firstly thanks for the formal introduction glad that your now
officially on board.

I've changed the thread topic slightly to discuss what work you have
done on your github branch regarding the Tika upgrade? I see that your
using Tika 1.1? Would it be possible to phase this into the existing
codebase before doing the module restructuring that we are currently
discussing elsewhere?

I vaguely remember you saying that there were some problems with tests
or something (further to the Tika dependency upgrade) but I cannot
confirm this just now and it would be great if you could refresh my

If we could review (with the intention to merge back into trunk) some
of your work more incrementally then i think we can phase in it
quicker... does this make sense?

Thank very much

On Tue, Aug 7, 2012 at 1:09 AM, Peter Ansell <> wrote:
> Hi all,
> I am a software engineer with a PhD in Computer Science. I have worked
> on a number of RDF related projects since the start of my PhD, mainly
> using Sesame, including also integrating Sesame with OWLAPI [1] over
> the last few months to suit my current projects needs.
> I am looking in the short term to restructure the Maven modules inside
> of Any23 so that the different facets can be reused, tested and
> maintained easily, particularly with a view to using the RDF related
> Tika enhancements that the Any23 MIME Detector provides. I made these
> changes a few months ago in my GitHub fork [2], so feel free to review
> them closely to suggest enhancements before I actually start. I am not
> sure when I will next have time to clean up the patches. The first
> step that I want to take is to split out the test resources into a
> single module and switch from "src/test/resources/*" File based access
> in tests to using this.getClass().getResourceAsStream("*"). I have
> implemented those changes in my git repository but the patches may
> need cleaning up as I have not gone back to review them yet. After
> that is done, it will be relatively simple to split out both the
> packages and tests into separate modules.
> In the short term I have also been tasked by the Sesame Developers
> with merging the Any23 and Sesametools NQuads parsers and integrating
> the resulting module into the Sesame Rio package. Then we can have a
> rock-solid, standards-based, NQuads parser/writer that everyone can
> easily reuse in a similar way to the other Rio parsers/writers. This
> is the culmination of the
> issue that Michele opened over a year ago.
> Cheers,
> Peter
> [1]
> [2]
> On 4 August 2012 12:25, Mattmann, Chris A (388J)
> <> wrote:
>> Hi Folks,
>> A while back, the Any23 PPMC and the Incubator PMC VOTEd to add Peter Ansell
>> to our ranks as a PPMC member and committer. Peter, welcome!
>> Feel free to say a bit about yourself!
>> Cheers,
>> Chris
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email:
>> WWW:
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


View raw message