tika-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Testing Tika
Date Mon, 30 Mar 2009 17:12:29 GMT
Hi,

On Mon, Mar 30, 2009 at 6:20 PM, Mark Kerzner <markkerzner@gmail.com> wrote:
> I am testing Tika on a set of about 200 documents, and I am planning more
> extensive tests. At the moment, it crashes on one PDF and on quite a few
> hmlt files.

Does "crashes" mean a) "process terminates with a core dump", b) "a
RuntimeException (or an Error) is thrown" or c) "a TikaException is
thrown".

> Of course I realize that these questions should be addressed to individual
> converterĀ maintainers, that is, PDFBox and Neko html parser.

For cases a) and b) we clearly should do something in Tika, but case
c) is pretty much in the realm of the parser library we use. If a
parser library doesn't work, we can either try to submit a patch there
and upgrade to a more recent release, or replace the library with an
alternative that works better. Adding Tika-specific customizations or
workarounds isn't a viable strategy in the long run.

BR,

Jukka Zitting

Mime
View raw message