incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremias Maerki <>
Subject Re: [VOTE] Tika - a content analysis toolkit
Date Sun, 18 Mar 2007 17:16:55 GMT
non-binding +1 from me.

On 18.03.2007 10:51:37 Jukka Zitting wrote:
> [ ] +1 Accept Tika as a new podling
> [ ] -1 Do not accept the new podling (provide reason, please)
> Instead of implementing its own document parsers, Tika will use existing
> parser libraries like Jakarta POI [1] and PDFBox [2].

I would like to make the Tika people aware that we've recently started a
little XMP framework as part of the XML Graphics Project. XMP is used
with a number of document formats, with PDF its most prominent format.
It could be interesting to work together on this. I've also been in
contact with Ben Litchfield, author of PDFBox, about possibly joining
forces on the topic. However, not much has happened. At the moment, the
XMP code can only cover what is necessary to implement the very basics
of the PDF/A-1b specification. But I'm sure it can be easily enhanced to
fit a wider audience. I already see the need to take the code a step
further in order to cover extension schemas that is mandated by the
PDF/A-1 standard. Finally, the code doesn't absolutely have to stay
within XML Graphics, I guess, but that's only me speaking.



Jeremias Maerki (watching with interest)

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message