lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Solr Exception
Date Wed, 09 Mar 2011 13:15:25 GMT
Then you should open a bug report on TIKA, providing them your files that do
not parse. Often the problem is in some of TIKA's underlying parser libs
like Apache POI, then there is nothing they can do. Maybe another TIKA issue
handles about the same problem, just search the issue tracker!

 

Uwe

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Deepak Singh [mailto:deepaks@praumtech.com] 
Sent: Wednesday, March 09, 2011 2:09 PM
To: dev@lucene.apache.org
Subject: Re: Solr Exception

 


downloaded apache-solr-3.1 still it giving TIKA Exception

On Wed, Mar 9, 2011 at 5:11 PM, Deepak Singh <deepaks@praumtech.com> wrote:

oh, thanks for the better solution.

 

On Wed, Mar 9, 2011 at 4:36 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

Hi,

 

These are all bugs in Apache TIKA not Solr, some of them are already fixed
in later TIKA versions (so you may try the soon-to-be-released Solr 3.1
version which contains a newer TIKA bundled).

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: uwe@thetaphi.de

 

From: Deepak Singh [mailto:deepaks@praumtech.com] 
Sent: Wednesday, March 09, 2011 12:03 PM
To: dev@lucene.apache.org
Subject: Re: Solr Exception

 


HTTP ERROR :500 (INTERNAL SERVER ERROR)

For DOC files:
org.apache.tika.exception.

TikaException :
-Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@1248f2
Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The property
set claims to have a size of 16 bytes. However, it exceeds 16 bytes.

-TIKA-198: Illegal IOException from
org.apache.tika.parser.microsoft.OfficeParser@1248f2
Caused by: java.io.IOException: block[ 0 ] already removed - does your POIFS
have circular or duplicate block references?


For PDF files:
org.apache.tika.exception.TikaException : 
-Unexpected RuntimeException from org.apache.tika.parser.Pdfparser@1b4cd65
Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be
cast to org.pdfbox.cos.COSDictionar
Caused by: java.lang.NullPointerException

 

-Unable to extract PDF content

HTTP ERROR:400 (BAD REQUEST)
-This error come when some fields are missing
ERROR:unknown field 'language' (Ex:content_status, description,version)

 

On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty <gora@mimirtech.com> wrote:

Hi,

This is probably better directed to the user list. Also, please provide
details of the exceptions from your log files.

Regards,
Gora

 

 

 


Mime
View raw message