lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: TIKA Errors Importing MS Word Documents into SOLR Cloud
Date Tue, 28 Feb 2012 02:53:35 GMT
You *probaby* can update the Tika libraries in Solr, but it'll be "interesting"
to get all the right ones updated, there are a bunch of them in Tika. And I
make no guarantees.

If it proves difficult, it's not too hard to write a SolrJ program that does
the Tika extraction and run it on a client totally separated from the Solr
server.

Best
Erick

On Sun, Feb 26, 2012 at 7:33 PM, Matthew Parker
<mparker@apogeeintegration.com> wrote:
> I tried to import some documents into SOLR Cloud using Apache Manifold.
>
> TIKA started throwing exceptions for various documents
>
> The exception reads like the following:
>
> org.apache.solr.common.SolrException
> at org.apache.solr.handler.extraction.ExtractionDocumentLoader.load(
> ExtractingDocumentLoader.java: 213)
> ..........
>
> Caused by:  org.apache.tika.exception.TikaException:
> UnexpectedRuntimeException from
> org.apche.tika.parser.microsoft.OfficeParser@d394424
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> ...........
> Caused by: java.lang.ArrayIndexOutOfBoundsException
> at java.lang.System.arraycopy(NativeMethod)
> at
> org.apache.poi.hwpf.usermodel.Picture.fillRawImageContent(Picture.java:363)
>
> It seems to be related to the following fix now in Tika 1.1
>
> https://issues.apache.org/bugzilla/show_bug.cgi?id=51902
>
> Can the Tika libraries in the SOLR trunk be updated?
>
> ------------------------------
> This e-mail and any files transmitted with it may be proprietary.  Please note that
any views or opinions presented in this e-mail are solely those of the author and do not necessarily
represent those of Apogee Integration.

Mime
View raw message