jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Getting rid of jackrabbit-text-extractors
Date Wed, 08 Apr 2009 12:13:13 GMT
Hi,

On Wed, Apr 8, 2009 at 1:43 PM, Marcel Reutegger
<marcel.reutegger@gmx.net> wrote:
> On Tue, Apr 7, 2009 at 23:29, Jukka Zitting <jukka.zitting@gmail.com> wrote:
>> Thus Jackrabbit 1.6 would no longer contain a separate text-extractors
>> jar, but all the existing TextExtractor classes would still be
>> incluced. In Jackrabbit 2.0 we'd drop all the TextExtractors and only
>> use Tika Parsers.
>
> hmm, this adds quite some dependencies to jackrabbit-core.

Currently we already have quite a few parsing dependencies through
jackrabbit-text-extractors. Tika has even more, but with TIKA-1878
we're already including them.

There's been some discussion in Tika about splitting Tika into a core
jar with no dependencies (or just a few like commons-io), and a
separate parser jar (or more) that contain the Parser implementations
that depend on the various parser libraries like POI. I could push
that idea forward in Tika if it would be useful in Jackrabbit.

> What if we kept the dependency from jackrabbit-core to
> jackrabbit-jcr-tests at version 1.5 but at the same time flag it
> optional? That would remove it from the dependency tree but you'd
> still have it in the pom (until we remove it in 2.0).

(I assume you mean jackrabbit-text-extractors)

The SearchIndex class currently has a hard dependency to TextExtractor
that needs to be there also on runtime, so we can't make the
text-extractors dependency optional without changing things. I'd
prefer to replace that dependency with one to the Tika Parser
interface, but then we need a hard Maven dependency on Tika.

In either case I think it's best for everyone if the current
TextExtractor classes will remain in the runtime classpath (in either
the text-extractors or the core jar) so that there's no need to modify
existing configurations.

BR,

Jukka Zitting

Mime
View raw message