jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Getting rid of jackrabbit-text-extractors
Date Wed, 08 Apr 2009 11:43:54 GMT
Hi,

On Tue, Apr 7, 2009 at 23:29, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> JCR-1878 is now resolved and Jackrabbit trunk is depending on Apache
> Tika for text extraction functionality. Thus there is little more need
> for jackrabbit-text-extractors as a standalone component. Anyone who
> needs that functionality separately from jackrabbit-core should just
> go for Tika directly.

+1

> For backwards compatibility with existing configurations (and
> potential extensions) we still need the current
> org.apache.jackrabbit.extractor classes, but I'm thinking of simply
> moving the entire package to jackrabbit-core and deprecating
> everything except the new Tika-based extractor. In fact I'd even go as
> far as changing the indexing code in jackrabbit-core to use the Tika
> Parser interface directly and only provide a backwards-compatibility
> layer for the TextExtractor classes we have.
>
> Thus Jackrabbit 1.6 would no longer contain a separate text-extractors
> jar, but all the existing TextExtractor classes would still be
> incluced. In Jackrabbit 2.0 we'd drop all the TextExtractors and only
> use Tika Parsers.

hmm, this adds quite some dependencies to jackrabbit-core.

What if we kept the dependency from jackrabbit-core to
jackrabbit-jcr-tests at version 1.5 but at the same time flag it
optional? That would remove it from the dependency tree but you'd
still have it in the pom (until we remove it in 2.0).

regards
 marcel

Mime
View raw message