Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 1684 invoked from network); 9 Mar 2011 13:16:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Mar 2011 13:16:14 -0000 Received: (qmail 26929 invoked by uid 500); 9 Mar 2011 13:16:13 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 26852 invoked by uid 500); 9 Mar 2011 13:16:13 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 26845 invoked by uid 99); 9 Mar 2011 13:16:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2011 13:16:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tommaso.teofili@gmail.com designates 209.85.161.176 as permitted sender) Received: from [209.85.161.176] (HELO mail-gx0-f176.google.com) (209.85.161.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2011 13:16:09 +0000 Received: by gxk20 with SMTP id 20so296986gxk.35 for ; Wed, 09 Mar 2011 05:15:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=zB2jPcUD5LTQFG2hQpIE2a8atPbmI1M2kqxVEwAEe/k=; b=gJ+QDi7qxMqJsN8BaaD4aH1/dBnqSY4L7iCstiXqeKfXeqk2r3tKPSizRmpTUti1X8 m+x7vL+6XIdyRzzfsJIFjrO9geZm1DxNzeqajjxMNBd3MLMtqmwQJpMhy14FIwcs5e0C eg940SAwh/UPS5rBS2vqJnyCSQTzBHhtljE8w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=jYIJ/aRMyxyWVMsJxCVBRUNZZXDBt8f1EIfv1CcPQ6pyzeUwRyLJz2+7Zrqw1P5ic4 3Q7EFzN2Z16m4QasZjCb/ObCJtE8OUjtXnrZ/ILlqVZq8Z/+CGSj5xqPgq37VydWHZ/1 bGTdX7dK8VBonrPru5aJvUQBcyi1DZO0Oi07E= Received: by 10.151.63.3 with SMTP id q3mr5783009ybk.443.1299676548142; Wed, 09 Mar 2011 05:15:48 -0800 (PST) MIME-Version: 1.0 Received: by 10.146.84.18 with HTTP; Wed, 9 Mar 2011 05:15:08 -0800 (PST) In-Reply-To: References: <012601cbde4a$117975f0$346c61d0$@thetaphi.de> From: Tommaso Teofili Date: Wed, 9 Mar 2011 14:15:08 +0100 Message-ID: Subject: Re: Solr Exception To: dev@lucene.apache.org Content-Type: multipart/alternative; boundary=000e0cd59292f9954c049e0c86b2 --000e0cd59292f9954c049e0c86b2 Content-Type: text/plain; charset=ISO-8859-1 Did you double check the element in your solrconfig.xml which points to the Tika jar you're using? 2011/3/9 Deepak Singh > > downloaded apache-solr-3.1 still it giving TIKA Exception > > > On Wed, Mar 9, 2011 at 5:11 PM, Deepak Singh wrote: > >> oh, thanks for the better solution. >> >> >> On Wed, Mar 9, 2011 at 4:36 PM, Uwe Schindler wrote: >> >>> Hi, >>> >>> >>> >>> These are all bugs in Apache TIKA not Solr, some of them are already >>> fixed in later TIKA versions (so you may try the soon-to-be-released Solr >>> 3.1 version which contains a newer TIKA bundled). >>> >>> >>> >>> Uwe >>> >>> >>> >>> ----- >>> >>> Uwe Schindler >>> >>> H.-H.-Meier-Allee 63, D-28213 Bremen >>> >>> http://www.thetaphi.de >>> >>> eMail: uwe@thetaphi.de >>> >>> >>> >>> *From:* Deepak Singh [mailto:deepaks@praumtech.com] >>> *Sent:* Wednesday, March 09, 2011 12:03 PM >>> *To:* dev@lucene.apache.org >>> *Subject:* Re: Solr Exception >>> >>> >>> >>> >>> *HTTP ERROR :500 (INTERNAL SERVER ERROR)* >>> >>> *For DOC files:* >>> org.apache.tika.exception. >>> >>> TikaException : >>> -Unexpected RuntimeException from >>> org.apache.tika.parser.microsoft.OfficeParser@1248f2 >>> Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The >>> property set claims to have a size of 16 bytes. However, it exceeds 16 >>> bytes. >>> >>> -TIKA-198: Illegal IOException from >>> org.apache.tika.parser.microsoft.OfficeParser@1248f2 >>> Caused by: java.io.IOException: block[ 0 ] already removed - does your >>> POIFS have circular or duplicate block references? >>> >>> >>> *For PDF files:* >>> org.apache.tika.exception.TikaException : >>> -Unexpected RuntimeException from >>> org.apache.tika.parser.Pdfparser@1b4cd65 >>> Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot >>> be cast to org.pdfbox.cos.COSDictionar >>> Caused by: java.lang.NullPointerException >>> >>> >>> >>> -Unable to extract PDF content >>> >>> *HTTP ERROR:400 (BAD REQUEST)* >>> -This error come when some fields are missing >>> ERROR:unknown field 'language' (Ex:content_status, description,version) >>> >>> >>> >>> On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty wrote: >>> >>> Hi, >>> >>> This is probably better directed to the user list. Also, please provide >>> details of the exceptions from your log files. >>> >>> Regards, >>> Gora >>> >>> >>> >> >> > --000e0cd59292f9954c049e0c86b2 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Did you double check the <lib> element in your solrconfig.xml which p= oints to the Tika jar you're using?

2= 011/3/9 Deepak Singh <deepaks@praumtech.com>

downloaded apache-solr-3.1 still it giv= ing TIKA Exception


On Wed, Mar 9, 2011 at 5:11 PM, Deepak Singh <deepaks@praumtech.com> wrote:
oh, thanks for the bett= er solution.


On Wed, Mar 9, 2011 at 4:36 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

Hi,

=A0

These are all bugs in Apache TIKA not Solr, some of them a= re already fixed in later TIKA versions (so you may try the soon-to-be-rele= ased Solr 3.1 version which contains a newer TIKA bundled).

=A0

Uwe

=A0

-----

Uwe Schindler

http://www.thetaph= i.de

uwe@thetaphi.= de

=A0

From: Deepak Singh [mailto:deepaks@praumtech.com]
Sent:<= /b> Wednesday, March 09, 2011 12:03 PM
To: dev@l= ucene.apache.org
Subject: Re: Solr Exception

=

=A0


HTTP ERROR :500 (INTERNAL SERVER ERROR)

For DOC files:=
org.apache.tika.exception.

TikaExcep= tion :
-Unexpected RuntimeException from org.apache.tika.= parser.microsoft.OfficeParser@1248f2
Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The propert= y set claims to have a size of 16 bytes. However, it exceeds 16 bytes.
<= br>-TIKA-198: Illegal IOException from org.apache.tika.parse= r.microsoft.OfficeParser@1248f2
Caused by: java.io.IOException: block[ 0 ] already removed - does your POIF= S have circular or duplicate block references?


For PDF files:=
org.apache.tika.exception.TikaException :
-Unexpected RuntimeEx= ception from org.apache.tika.parser.Pdfparser@1b4cd65
Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be = cast to org.pdfbox.cos.COSDictionar
Caused by: java.lang.NullPointerExce= ption

=A0

-Unable to extract PDF conten= t

HTTP ERROR:400 (BAD REQUEST)
-This error come when some field= s are missing
ERROR:unknown field 'language' (Ex:content_status,= description,version)

=A0

On Wed, Mar 9, 2011 at 4:19 PM, Gora Moh= anty <gora@mimir= tech.com> wrote:

Hi,

This is probably better directed to= the user list. Also, please provide details of the exceptions from your lo= g files.

Regards,
Gora

=A0

=



--000e0cd59292f9954c049e0c86b2--