lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Problem with pdf, upgrading Cell
Date Mon, 03 May 2010 22:24:33 GMT
I've opened https://issues.apache.org/jira/browse/SOLR-1902 to track this.  It is indeed a
bug somewhere (still investigating).  It seems that Tika is now picking an EmptyParser implementation
when trying to determine which parser to use, despite the fact that it properly identifies
the MIME Type.

-Grant

On May 3, 2010, at 5:36 PM, Grant Ingersoll wrote:

> I'm investigating.
> 
> On May 3, 2010, at 5:17 AM, Marc Ghorayeb wrote:
> 
>> 
>> Hi,
>> Grant, i confirm what Praveen has said, any PDF i try does not work with the new
Tika and SVN versions. :(
>> Marc
>> 
>>> From: sagarwal@opentext.com
>>> To: solr-user@lucene.apache.org
>>> Date: Mon, 3 May 2010 13:05:24 +0530
>>> Subject: RE: Problem with pdf, upgrading Cell
>>> 
>>> Hello,
>>> 
>>> Please let me know if anybody figured out a way out of this issue. 
>>> 
>>> Thanks,
>>> Sandhya
>>> 
>>> -----Original Message-----
>>> From: Praveen Agrawal [mailto:pkalwar@gmail.com] 
>>> Sent: Friday, April 30, 2010 11:14 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Problem with pdf, upgrading Cell
>>> 
>>> Grant,
>>> You can try any of the sample pdfs that come in /docs folder of Solr 1.4
>>> dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. Only
>>> metadata i.e. stream_size, content_type apart from my own literals are
>>> indexed, and content is missing..
>>> 
>>> 
>>> On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll <gsingers@apache.org>wrote:
>>> 
>>>> Praveen and Marc,
>>>> 
>>>> Can you share the PDF (feel free to email my private email) that fails in
>>>> Solr?
>>>> 
>>>> Thanks,
>>>> Grant
>>>> 
>>>> 
>>>> On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote:
>>>> 
>>>>> 
>>>>> Hi
>>>>> Nope i didn't get it to work... Just like you, command line version of
>>>> tika extracts correctly the content, but once included in Solr, no content
>>>> is extracted.
>>>>> What i tried until now is:- Updating the tika libraries inside Solr 1.4
>>>> public version, no luck there.- Downloading the latest SVN version, compiled
>>>> it, and started from a simple schema, still no luck.- Getting other versions
>>>> compiled on hudson (nightly builds), and testing them also, still no
>>>> extraction.
>>>>> I sent a mail on the developpers mailing list but they told me i should
>>>> just mail here, hope some developper reads this because it's quite an
>>>> important feature of Solr and somehow it got broke between the 1.4 release,
>>>> and the last version on the svn.
>>>>> Marc
>>>>> _________________________________________________________________
>>>>> Consultez gratuitement vos emails Orange, Gmail, Free, ... directement
>>>> dans HOTMAIL !
>>>>> http://www.windowslive.fr/hotmail/agregation/
>>>> 
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>> 
>>>> Search the Lucene ecosystem using Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>> 
>>>> 
>> 		 	   		  
>> _________________________________________________________________
>> Hotmail et MSN dans la poche? HOTMAIL et MSN sont dispo gratuitement sur votre téléphone!
>> http://www.messengersurvotremobile.com/?d=Hotmail
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


Mime
View raw message