jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jzitt...@adobe.com>
Subject Re: jackrabbit, lucene, tika ... and pdfbox
Date Thu, 10 Mar 2011 09:27:27 GMT

On 03/09/2011 04:51 AM, Kevin Jansz wrote:
> It's not a huge issue I guess as it seems with tika 0.9 (or 0.8.1?)
> the PDF parser issue will be resolved in which case I expect the
> code in org.apache.jackrabbit.core.query.pdf.* will disappear along
> with reference to it from the tika-config.xml.

Yes, that's what we've already done in trunk.

> I'm taking the time to mention it here in case it saves someone time
> and also to gauge if our view of lucene, tika and the parsers is
> incorrect - that future releases of jackrabbit may still include
> parsers other than DefaultParser and EmptyParser in it's
> tika-config.xml.

Your view is correct. The idea is to avoid direct parser class 
references in jackrabbit-core and just rely on the service provider 
loader mechanism in Tika to pick up all the available parsers.

We also decided to move the tika-parsers dependency from jackrabbit-core 
to deployment packages like jackrabbit-webapp and jackrabbit-standalone. 
This should make it even easier for people to set up custom deployments 
with few or no parser libraries.

Jukka Zitting

View raw message