lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Solr Exception
Date Wed, 09 Mar 2011 11:06:40 GMT
Hi,

 

These are all bugs in Apache TIKA not Solr, some of them are already fixed
in later TIKA versions (so you may try the soon-to-be-released Solr 3.1
version which contains a newer TIKA bundled).

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Deepak Singh [mailto:deepaks@praumtech.com] 
Sent: Wednesday, March 09, 2011 12:03 PM
To: dev@lucene.apache.org
Subject: Re: Solr Exception

 


HTTP ERROR :500 (INTERNAL SERVER ERROR)

For DOC files:
org.apache.tika.exception.

TikaException :
-Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@1248f2
Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The property
set claims to have a size of 16 bytes. However, it exceeds 16 bytes.

-TIKA-198: Illegal IOException from
org.apache.tika.parser.microsoft.OfficeParser@1248f2
Caused by: java.io.IOException: block[ 0 ] already removed - does your POIFS
have circular or duplicate block references?


For PDF files:
org.apache.tika.exception.TikaException : 
-Unexpected RuntimeException from org.apache.tika.parser.Pdfparser@1b4cd65
Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be
cast to org.pdfbox.cos.COSDictionar
Caused by: java.lang.NullPointerException

 

-Unable to extract PDF content

HTTP ERROR:400 (BAD REQUEST)
-This error come when some fields are missing
ERROR:unknown field 'language' (Ex:content_status, description,version)

 

On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty <gora@mimirtech.com> wrote:

Hi,

This is probably better directed to the user list. Also, please provide
details of the exceptions from your log files.

Regards,
Gora

 


Mime
View raw message