Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 28820 invoked from network); 4 May 2010 07:28:49 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 May 2010 07:28:49 -0000 Received: (qmail 58384 invoked by uid 500); 4 May 2010 07:28:47 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 58256 invoked by uid 500); 4 May 2010 07:28:46 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 58248 invoked by uid 99); 4 May 2010 07:28:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 May 2010 07:28:46 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sagarwal@opentext.com designates 204.138.115.203 as permitted sender) Received: from [204.138.115.203] (HELO opentext.com) (204.138.115.203) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 May 2010 07:28:38 +0000 Received: from otwlpm02.smtp.dmz.opentext.com (otwlpm02.smtp.dmz.opentext.com [192.168.15.231]) by opentext.com (8.12.8/8.12.8) with ESMTP id o447SHst001243 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=FAIL) for ; Tue, 4 May 2010 03:28:17 -0400 Received: from vectorsvc.wl.opentext.com (ava.wl.opentext.com [172.21.5.96]) by otwlpm02.smtp.dmz.opentext.com (8.14.4/8.14.4) with ESMTP id o447SGSx010965 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Tue, 4 May 2010 03:28:17 -0400 (envelope-from sagarwal@opentext.com) Received: from OTWLMX04.opentext.net (otwlxg01.wl.opentext.com [10.2.102.23]) by vectorsvc.wl.opentext.com (8.12.8/8.12.8) with ESMTP id o447SGpJ001231 for ; Tue, 4 May 2010 03:28:16 -0400 Received: from Otaushub01.opentext.net ([10.20.17.185]) by OTWLMX04.opentext.net with Microsoft SMTPSVC(6.0.3790.3959); Tue, 4 May 2010 03:28:16 -0400 Received: from othydhub02.opentext.net (10.96.51.13) by Otaushub01.opentext.net (10.20.17.185) with Microsoft SMTP Server (TLS) id 8.1.393.1; Tue, 4 May 2010 02:28:15 -0500 Received: from othydmail01.opentext.net ([::1]) by othydhub02.opentext.net ([::1]) with mapi; Tue, 4 May 2010 12:58:10 +0530 From: Sandhya Agarwal To: "solr-user@lucene.apache.org" Date: Tue, 4 May 2010 12:58:09 +0530 Subject: RE: Problem with pdf, upgrading Cell Thread-Topic: Problem with pdf, upgrading Cell Thread-Index: AcrrIsxaPlBcoy0QTmyF5/iUKHNZpQAN/SAA Message-ID: References: ,<6FB4966B-9EBC-427E-B2D0-AE6674D54BD0@apache.org>,, <9811F132-35B1-4EC1-BCC9-ED4F19CB74D7@apache.org> <6B7E5032-55ED-4AEB-9636-28687DA49DBA@apache.org> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_E64E6DF672048549B8CE4C1EDF048EF001F40C105Aothydmail01op_" MIME-Version: 1.0 X-OriginalArrivalTime: 04 May 2010 07:28:16.0757 (UTC) FILETIME=[5CF4AA50:01CAEB5B] X-Archived: msg.21v3FrL:2010-05-04:otwlpm02.smtp.dmz.opentext.com X-Virus-Checked: Checked by ClamAV on apache.org --_000_E64E6DF672048549B8CE4C1EDF048EF001F40C105Aothydmail01op_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hello, But I see that the libraries are being loaded : INFO: Adding specified lib dirs to ClassLoader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar= ' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcmail-jdk1= 5-1.45.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcprov-jdk1= 5-1.45.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-com= press-1.0.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-log= ging-1.1.1.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/dom4j-1.6.1= .jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/fontbox-1.1= .0.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/geronimo-st= ax-api_1.0_spec-1.0.1.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/jempbox-1.1= .0.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/log4j-1.2.1= 4.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/metadata-ex= tractor-2.4.0-beta-1.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/pdfbox-1.1.= 0.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-3.6.jar= ' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-3= .6.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-s= chemas-3.6.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-scratch= pad-3.6.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tagsoup-1.2= .jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-core-0= .7.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-parser= s-0.7.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xercesImpl-= 2.8.1.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xml-apis-1.= 0.b2.jar' to classloader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xmlbeans-2.= 3.0.jar' to classloader May 4, 2010 12:50:16 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/dist/apache-solr-cell-1.4.0.jar' t= o classloader May 4, 2010 12:50:20 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/dist/apache-solr-clustering-1.4.0.= jar' to classloader May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/carrot2-min= i-3.1.0.jar' to classloader May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/commons-lan= g-2.4.jar' to classloader May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/ehcache-1.6= .2.jar' to classloader May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/google-coll= ections-1.0-rc2.jar' to classloader May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/jackson-cor= e-asl-0.9.9-6.jar' to classloader May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/jackson-map= per-asl-0.9.9-6.jar' to classloader May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader replaceClas= sLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/log4j-1.2.1= 4.jar' to classloader Thanks, Sandhya -----Original Message----- From: Grant Ingersoll [mailto:gsiasf@gmail.com] On Behalf Of Grant Ingersol= l Sent: Tuesday, May 04, 2010 6:13 AM Cc: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Little more info... Seems to be a classloading issue. The tests pass, but = they aren't loading the Tika libraries via the Solr ResourceLoader, whereas= the example is. Marc, one thing to try is to unjar the Solr WAR file and = put the Tika libs in there, as I bet it will then work. Note, however, I h= aven't tried this. On May 3, 2010, at 6:24 PM, Grant Ingersoll wrote: > I've opened https://issues.apache.org/jira/browse/SOLR-1902 to track this= . It is indeed a bug somewhere (still investigating). It seems that Tika = is now picking an EmptyParser implementation when trying to determine which= parser to use, despite the fact that it properly identifies the MIME Type. > > -Grant > > On May 3, 2010, at 5:36 PM, Grant Ingersoll wrote: > >> I'm investigating. >> >> On May 3, 2010, at 5:17 AM, Marc Ghorayeb wrote: >> >>> >>> Hi, >>> Grant, i confirm what Praveen has said, any PDF i try does not work wit= h the new Tika and SVN versions. :( >>> Marc >>> >>>> From: sagarwal@opentext.com >>>> To: solr-user@lucene.apache.org >>>> Date: Mon, 3 May 2010 13:05:24 +0530 >>>> Subject: RE: Problem with pdf, upgrading Cell >>>> >>>> Hello, >>>> >>>> Please let me know if anybody figured out a way out of this issue. >>>> >>>> Thanks, >>>> Sandhya >>>> >>>> -----Original Message----- >>>> From: Praveen Agrawal [mailto:pkalwar@gmail.com] >>>> Sent: Friday, April 30, 2010 11:14 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Problem with pdf, upgrading Cell >>>> >>>> Grant, >>>> You can try any of the sample pdfs that come in /docs folder of Solr 1= .4 >>>> dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. = Only >>>> metadata i.e. stream_size, content_type apart from my own literals are >>>> indexed, and content is missing.. >>>> >>>> >>>> On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll = wrote: >>>> >>>>> Praveen and Marc, >>>>> >>>>> Can you share the PDF (feel free to email my private email) that fail= s in >>>>> Solr? >>>>> >>>>> Thanks, >>>>> Grant >>>>> >>>>> >>>>> On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote: >>>>> >>>>>> >>>>>> Hi >>>>>> Nope i didn't get it to work... Just like you, command line version = of >>>>> tika extracts correctly the content, but once included in Solr, no co= ntent >>>>> is extracted. >>>>>> What i tried until now is:- Updating the tika libraries inside Solr = 1.4 >>>>> public version, no luck there.- Downloading the latest SVN version, c= ompiled >>>>> it, and started from a simple schema, still no luck.- Getting other v= ersions >>>>> compiled on hudson (nightly builds), and testing them also, still no >>>>> extraction. >>>>>> I sent a mail on the developpers mailing list but they told me i sho= uld >>>>> just mail here, hope some developper reads this because it's quite an >>>>> important feature of Solr and somehow it got broke between the 1.4 re= lease, >>>>> and the last version on the svn. >>>>>> Marc >>>>>> _________________________________________________________________ >>>>>> Consultez gratuitement vos emails Orange, Gmail, Free, ... directeme= nt >>>>> dans HOTMAIL ! >>>>>> http://www.windowslive.fr/hotmail/agregation/ >>>>> >>>>> -------------------------- >>>>> Grant Ingersoll >>>>> http://www.lucidimagination.com/ >>>>> >>>>> Search the Lucene ecosystem using Solr/Lucene: >>>>> http://www.lucidimagination.com/search >>>>> >>>>> >>> >>> _________________________________________________________________ >>> Hotmail et MSN dans la poche? HOTMAIL et MSN sont dispo gratuitement su= r votre t=E9l=E9phone! >>> http://www.messengersurvotremobile.com/?d=3DHotmail >> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimaginati= on.com/search >> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimaginatio= n.com/search > -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.= com/search --_000_E64E6DF672048549B8CE4C1EDF048EF001F40C105Aothydmail01op_--