tika-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Testing Tika
Date Mon, 30 Mar 2009 17:12:29 GMT

On Mon, Mar 30, 2009 at 6:20 PM, Mark Kerzner <markkerzner@gmail.com> wrote:
> I am testing Tika on a set of about 200 documents, and I am planning more
> extensive tests. At the moment, it crashes on one PDF and on quite a few
> hmlt files.

Does "crashes" mean a) "process terminates with a core dump", b) "a
RuntimeException (or an Error) is thrown" or c) "a TikaException is

> Of course I realize that these questions should be addressed to individual
> converterĀ maintainers, that is, PDFBox and Neko html parser.

For cases a) and b) we clearly should do something in Tika, but case
c) is pretty much in the realm of the parser library we use. If a
parser library doesn't work, we can either try to submit a patch there
and upgrade to a more recent release, or replace the library with an
alternative that works better. Adding Tika-specific customizations or
workarounds isn't a viable strategy in the long run.


Jukka Zitting

View raw message