jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Langley <langleyatw...@gmail.com>
Subject Re: jackrabbit, lucene, tika ... and pdfbox
Date Fri, 11 Mar 2011 19:03:43 GMT
I have to appologize for taking the "lazy way out"... but Jukka, is this
change that referred to being in the trunk going to be part of 2.2.5?

Thanks in advance for any pointers.

-- Langley

On Thu, Mar 10, 2011 at 4:27 AM, Jukka Zitting <jzitting@adobe.com> wrote:

> Hi,
>
>
> On 03/09/2011 04:51 AM, Kevin Jansz wrote:
>
>> It's not a huge issue I guess as it seems with tika 0.9 (or 0.8.1?)
>> the PDF parser issue will be resolved in which case I expect the
>> code in org.apache.jackrabbit.core.query.pdf.* will disappear along
>> with reference to it from the tika-config.xml.
>>
>
> Yes, that's what we've already done in trunk.
>
>
>  I'm taking the time to mention it here in case it saves someone time
>> and also to gauge if our view of lucene, tika and the parsers is
>> incorrect - that future releases of jackrabbit may still include
>> parsers other than DefaultParser and EmptyParser in it's
>> tika-config.xml.
>>
>
> Your view is correct. The idea is to avoid direct parser class references
> in jackrabbit-core and just rely on the service provider loader mechanism in
> Tika to pick up all the available parsers.
>
> We also decided to move the tika-parsers dependency from jackrabbit-core to
> deployment packages like jackrabbit-webapp and jackrabbit-standalone. This
> should make it even easier for people to set up custom deployments with few
> or no parser libraries.
>
> --
> Jukka Zitting
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message