Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 89320 invoked from network); 3 May 2010 07:36:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 May 2010 07:36:18 -0000 Received: (qmail 42555 invoked by uid 500); 3 May 2010 07:36:16 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 42442 invoked by uid 500); 3 May 2010 07:36:16 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 42434 invoked by uid 99); 3 May 2010 07:36:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 07:36:15 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sagarwal@opentext.com designates 204.138.115.203 as permitted sender) Received: from [204.138.115.203] (HELO opentext.com) (204.138.115.203) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 07:36:09 +0000 Received: from otwlpm01.smtp.dmz.opentext.com (otwlpm01.smtp.dmz.opentext.com [192.168.15.230]) by opentext.com (8.12.8/8.12.8) with ESMTP id o437Zlst029374 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=FAIL) for ; Mon, 3 May 2010 03:35:48 -0400 Received: from vectorsvc.wl.opentext.com (ava.wl.opentext.com [172.21.5.96]) by otwlpm01.smtp.dmz.opentext.com (8.14.4/8.14.4) with ESMTP id o437Zl7m015412 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Mon, 3 May 2010 03:35:47 -0400 (envelope-from sagarwal@opentext.com) Received: from OTWLMX04.opentext.net (otwlxg01.wl.opentext.com [10.2.102.23]) by vectorsvc.wl.opentext.com (8.12.8/8.12.8) with ESMTP id o437ZlpJ029371 for ; Mon, 3 May 2010 03:35:47 -0400 Received: from Otaushub01.opentext.net ([10.20.17.185]) by OTWLMX04.opentext.net with Microsoft SMTPSVC(6.0.3790.3959); Mon, 3 May 2010 03:35:31 -0400 Received: from otaushub02.opentext.net (10.20.17.186) by Otaushub01.opentext.net (10.20.17.185) with Microsoft SMTP Server (TLS) id 8.1.393.1; Mon, 3 May 2010 02:35:30 -0500 Received: from othydhub01.opentext.net (10.96.51.12) by otaushub02.opentext.net (10.20.17.186) with Microsoft SMTP Server (TLS) id 8.1.393.1; Mon, 3 May 2010 02:35:30 -0500 Received: from othydmail01.opentext.net ([::1]) by othydhub01.opentext.net ([::1]) with mapi; Mon, 3 May 2010 13:05:25 +0530 From: Sandhya Agarwal To: "solr-user@lucene.apache.org" Date: Mon, 3 May 2010 13:05:24 +0530 Subject: RE: Problem with pdf, upgrading Cell Thread-Topic: Problem with pdf, upgrading Cell Thread-Index: Acroo73LA+pO79d3QVSSngDbJOeXSwB7zfcQ Message-ID: References: <6FB4966B-9EBC-427E-B2D0-AE6674D54BD0@apache.org> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginalArrivalTime: 03 May 2010 07:35:31.0600 (UTC) FILETIME=[35BAB100:01CAEA93] X-Archived: msg.18PyhBS:2010-05-03:otwlpm01.smtp.dmz.opentext.com X-Virus-Checked: Checked by ClamAV on apache.org Hello, Please let me know if anybody figured out a way out of this issue.=20 Thanks, Sandhya -----Original Message----- From: Praveen Agrawal [mailto:pkalwar@gmail.com]=20 Sent: Friday, April 30, 2010 11:14 PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Grant, You can try any of the sample pdfs that come in /docs folder of Solr 1.4 dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. Only metadata i.e. stream_size, content_type apart from my own literals are indexed, and content is missing.. On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll wrote= : > Praveen and Marc, > > Can you share the PDF (feel free to email my private email) that fails in > Solr? > > Thanks, > Grant > > > On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote: > > > > > Hi > > Nope i didn't get it to work... Just like you, command line version of > tika extracts correctly the content, but once included in Solr, no conten= t > is extracted. > > What i tried until now is:- Updating the tika libraries inside Solr 1.4 > public version, no luck there.- Downloading the latest SVN version, compi= led > it, and started from a simple schema, still no luck.- Getting other versi= ons > compiled on hudson (nightly builds), and testing them also, still no > extraction. > > I sent a mail on the developpers mailing list but they told me i should > just mail here, hope some developper reads this because it's quite an > important feature of Solr and somehow it got broke between the 1.4 releas= e, > and the last version on the svn. > > Marc > > _________________________________________________________________ > > Consultez gratuitement vos emails Orange, Gmail, Free, ... directement > dans HOTMAIL ! > > http://www.windowslive.fr/hotmail/agregation/ > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > >