jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Getting rid of jackrabbit-text-extractors
Date Tue, 07 Apr 2009 21:29:06 GMT

JCR-1878 is now resolved and Jackrabbit trunk is depending on Apache
Tika for text extraction functionality. Thus there is little more need
for jackrabbit-text-extractors as a standalone component. Anyone who
needs that functionality separately from jackrabbit-core should just
go for Tika directly.

For backwards compatibility with existing configurations (and
potential extensions) we still need the current
org.apache.jackrabbit.extractor classes, but I'm thinking of simply
moving the entire package to jackrabbit-core and deprecating
everything except the new Tika-based extractor. In fact I'd even go as
far as changing the indexing code in jackrabbit-core to use the Tika
Parser interface directly and only provide a backwards-compatibility
layer for the TextExtractor classes we have.

Thus Jackrabbit 1.6 would no longer contain a separate text-extractors
jar, but all the existing TextExtractor classes would still be
incluced. In Jackrabbit 2.0 we'd drop all the TextExtractors and only
use Tika Parsers.


Jukka Zitting

View raw message