jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Skinner <shedloadsofb...@hotmail.com>
Subject RE: office 2007 files
Date Thu, 28 May 2009 09:56:21 GMT

The idea of patching source code can become a logistical nightmare because you are then responsible
for ensuring that the patched source code is kept up to date with future releases and tested.
IMO better to allow Jackrabbit's Hudson to do that for me ;-)

My preferred option would therefore be to use the jackrabbit-tika component. I do confess
that I had not previously looked at this when it was available.

> From: jukka.zitting@gmail.com
> Date: Thu, 28 May 2009 11:49:50 +0200
> Subject: Re: office 2007 files
> To: dev@jackrabbit.apache.org
> Hi,
> On Thu, May 28, 2009 at 11:40 AM, Paul Skinner
> <shedloadsofbeer@hotmail.com> wrote:
> > If both poi msoffice text extractor and Apache Tika Office 2007 support is
> > being targeted at 2.0 then does this mean that anyone using 1.x will not be
> > able to index Office 2007 docs?
> If you are running on Java 5 or higher, you can still apply the
> JCR-1887 patch to get Office 2007 support in Jackrabbit 1.6. The main
> problem with the change is that it causes Jackrabbit   not to compile
> on Java 1.4, that's still the base platform for the Jackrabbit 1.x
> releases. Unless someone comes up with a way to fix this (the Tika
> option provided a workaround, but there are other issues with that
> approach, see JCR-1878), we'll need to rely on people patching the
> sources themselves.
> Alternatively, we can restore the jackrabbit-tika component I had
> earlier in the Jackrabbit sandbox and release that for people who want
> Office 2007 support (and all the other good Tika stuff) without having
> to patch sources. The nice thing about this option is that it would
> work also for many previous Jackrabbit 1.x releases.
> What would work best for you?
> BR,
> Jukka Zitting

View your Twitter and Flickr updates from one place – Learn more!
View raw message