lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Problem with pdf, upgrading Cell
Date Mon, 10 May 2010 14:37:59 GMT
I've integrated this into Solr's trunk: https://issues.apache.org/jira/browse/SOLR-1902


-Grant

On May 6, 2010, at 3:40 AM, Sandhya Agarwal wrote:

> Praveen,
> 
> You can get the latest code, containing the fix, from here :
> 
> http://lucene.apache.org/tika/source-repository.html
> 
> Thanks,
> Sandhya
> 
> -----Original Message-----
> From: Praveen Agrawal [mailto:pkalwar@gmail.com] 
> Sent: Wednesday, May 05, 2010 10:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problem with pdf, upgrading Cell
> 
> It reports that Jukka has resolved the issue (Tika-419), and now waiting for
> Grant to verify (Solr-1902). But it seems the resolution will be available
> in 0.8 version of Tika.
> 
> If it solves the problem, Is there a way to get it now? Any SVN trunk access
> etc? All i see there is 0.7 src zip to download..
> 
> Thanks.
> Praveen
> 
> 
> On Tue, May 4, 2010 at 3:59 PM, Grant Ingersoll <gsingers@apache.org> wrote:
> 
>> Yes, it is loading the libraries, but they are in a different classloader
>> that apparently the new way Tika loads doesn't have access to.
>> 
>> -Grant
>> 
>> On May 4, 2010, at 3:28 AM, Sandhya Agarwal wrote:
>> 
>>> Hello,
>>> 
>>> 
>>> 
>>> But I see that the libraries are being loaded :
>>> 
>>> 
>>> 
>>> INFO: Adding specified lib dirs to ClassLoader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-compress-1.0.jar'
>> to classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-logging-1.1.1.jar'
>> to classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/dom4j-1.6.1.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/fontbox-1.1.0.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar'
>> to classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/jempbox-1.1.0.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/log4j-1.2.14.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/metadata-extractor-2.4.0-beta-1.jar'
>> to classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/pdfbox-1.1.0.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-3.6.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-3.6.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-schemas-3.6.jar'
>> to classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-scratchpad-3.6.jar'
>> to classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tagsoup-1.2.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-core-0.7.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-parsers-0.7.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xercesImpl-2.8.1.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xml-apis-1.0.b2.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xmlbeans-2.3.0.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:50:16 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding 'file:/C:/apache-solr-1.4.0/dist/apache-solr-cell-1.4.0.jar'
>> to classloader
>>> 
>>> May 4, 2010 12:50:20 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/dist/apache-solr-clustering-1.4.0.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/carrot2-mini-3.1.0.jar'
>> to classloader
>>> 
>>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/commons-lang-2.4.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/ehcache-1.6.2.jar' to
>> classloader
>>> 
>>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/google-collections-1.0-rc2.jar'
>> to classloader
>>> 
>>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/jackson-core-asl-0.9.9-6.jar'
>> to classloader
>>> 
>>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/jackson-mapper-asl-0.9.9-6.jar'
>> to classloader
>>> 
>>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
>> replaceClassLoader
>>> 
>>> INFO: Adding
>> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/log4j-1.2.14.jar' to
>> classloader
>>> 
>>> 
>>> 
>>> Thanks,
>>> 
>>> Sandhya
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Grant Ingersoll [mailto:gsiasf@gmail.com] On Behalf Of Grant
>> Ingersoll
>>> Sent: Tuesday, May 04, 2010 6:13 AM
>>> Cc: solr-user@lucene.apache.org
>>> Subject: Re: Problem with pdf, upgrading Cell
>>> 
>>> 
>>> 
>>> Little more info... Seems to be a classloading issue.  The tests pass,
>> but they aren't loading the Tika libraries via the Solr ResourceLoader,
>> whereas the example is.  Marc, one thing to try is to unjar the Solr WAR
>> file and put the Tika libs in there, as I bet it will then work.  Note,
>> however, I haven't tried this.
>>> 
>>> 
>>> 
>>> On May 3, 2010, at 6:24 PM, Grant Ingersoll wrote:
>>> 
>>> 
>>> 
>>>> I've opened https://issues.apache.org/jira/browse/SOLR-1902 to track
>> this.  It is indeed a bug somewhere (still investigating).  It seems that
>> Tika is now picking an EmptyParser implementation when trying to determine
>> which parser to use, despite the fact that it properly identifies the MIME
>> Type.
>>> 
>>>> 
>>> 
>>>> -Grant
>>> 
>>>> 
>>> 
>>>> On May 3, 2010, at 5:36 PM, Grant Ingersoll wrote:
>>> 
>>>> 
>>> 
>>>>> I'm investigating.
>>> 
>>>>> 
>>> 
>>>>> On May 3, 2010, at 5:17 AM, Marc Ghorayeb wrote:
>>> 
>>>>> 
>>> 
>>>>>> 
>>> 
>>>>>> Hi,
>>> 
>>>>>> Grant, i confirm what Praveen has said, any PDF i try does not work
>> with the new Tika and SVN versions. :(
>>> 
>>>>>> Marc
>>> 
>>>>>> 
>>> 
>>>>>>> From: sagarwal@opentext.com
>>> 
>>>>>>> To: solr-user@lucene.apache.org
>>> 
>>>>>>> Date: Mon, 3 May 2010 13:05:24 +0530
>>> 
>>>>>>> Subject: RE: Problem with pdf, upgrading Cell
>>> 
>>>>>>> 
>>> 
>>>>>>> Hello,
>>> 
>>>>>>> 
>>> 
>>>>>>> Please let me know if anybody figured out a way out of this issue.
>>> 
>>>>>>> 
>>> 
>>>>>>> Thanks,
>>> 
>>>>>>> Sandhya
>>> 
>>>>>>> 
>>> 
>>>>>>> -----Original Message-----
>>> 
>>>>>>> From: Praveen Agrawal [mailto:pkalwar@gmail.com]
>>> 
>>>>>>> Sent: Friday, April 30, 2010 11:14 PM
>>> 
>>>>>>> To: solr-user@lucene.apache.org
>>> 
>>>>>>> Subject: Re: Problem with pdf, upgrading Cell
>>> 
>>>>>>> 
>>> 
>>>>>>> Grant,
>>> 
>>>>>>> You can try any of the sample pdfs that come in /docs folder
of Solr
>> 1.4
>>> 
>>>>>>> dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf'
etc.
>> Only
>>> 
>>>>>>> metadata i.e. stream_size, content_type apart from my own literals
>> are
>>> 
>>>>>>> indexed, and content is missing..
>>> 
>>>>>>> 
>>> 
>>>>>>> 
>>> 
>>>>>>> On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll <
>> gsingers@apache.org>wrote:
>>> 
>>>>>>> 
>>> 
>>>>>>>> Praveen and Marc,
>>> 
>>>>>>>> 
>>> 
>>>>>>>> Can you share the PDF (feel free to email my private email)
that
>> fails in
>>> 
>>>>>>>> Solr?
>>> 
>>>>>>>> 
>>> 
>>>>>>>> Thanks,
>>> 
>>>>>>>> Grant
>>> 
>>>>>>>> 
>>> 
>>>>>>>> 
>>> 
>>>>>>>> On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote:
>>> 
>>>>>>>> 
>>> 
>>>>>>>>> 
>>> 
>>>>>>>>> Hi
>>> 
>>>>>>>>> Nope i didn't get it to work... Just like you, command
line version
>> of
>>> 
>>>>>>>> tika extracts correctly the content, but once included in
Solr, no
>> content
>>> 
>>>>>>>> is extracted.
>>> 
>>>>>>>>> What i tried until now is:- Updating the tika libraries
inside Solr
>> 1.4
>>> 
>>>>>>>> public version, no luck there.- Downloading the latest SVN
version,
>> compiled
>>> 
>>>>>>>> it, and started from a simple schema, still no luck.- Getting
other
>> versions
>>> 
>>>>>>>> compiled on hudson (nightly builds), and testing them also,
still no
>>> 
>>>>>>>> extraction.
>>> 
>>>>>>>>> I sent a mail on the developpers mailing list but they
told me i
>> should
>>> 
>>>>>>>> just mail here, hope some developper reads this because it's
quite
>> an
>>> 
>>>>>>>> important feature of Solr and somehow it got broke between
the 1.4
>> release,
>>> 
>>>>>>>> and the last version on the svn.
>>> 
>>>>>>>>> Marc
>>> 
>>>>>>>>> _________________________________________________________________
>>> 
>>>>>>>>> Consultez gratuitement vos emails Orange, Gmail, Free,
...
>> directement
>>> 
>>>>>>>> dans HOTMAIL !
>>> 
>>>>>>>>> http://www.windowslive.fr/hotmail/agregation/
>>> 
>>>>>>>> 
>>> 
>>>>>>>> --------------------------
>>> 
>>>>>>>> Grant Ingersoll
>>> 
>>>>>>>> http://www.lucidimagination.com/
>>> 
>>>>>>>> 
>>> 
>>>>>>>> Search the Lucene ecosystem using Solr/Lucene:
>>> 
>>>>>>>> http://www.lucidimagination.com/search
>>> 
>>>>>>>> 
>>> 
>>>>>>>> 
>>> 
>>>>>> 
>>> 
>>>>>> _________________________________________________________________
>>> 
>>>>>> Hotmail et MSN dans la poche? HOTMAIL et MSN sont dispo gratuitement
>> sur votre téléphone!
>>> 
>>>>>> http://www.messengersurvotremobile.com/?d=Hotmail
>>> 
>>>>> 
>>> 
>>>>> --------------------------
>>> 
>>>>> Grant Ingersoll
>>> 
>>>>> http://www.lucidimagination.com/
>>> 
>>>>> 
>>> 
>>>>> Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>> 
>>>>> 
>>> 
>>>> 
>>> 
>>>> --------------------------
>>> 
>>>> Grant Ingersoll
>>> 
>>>> http://www.lucidimagination.com/
>>> 
>>>> 
>>> 
>>>> Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --------------------------
>>> 
>>> Grant Ingersoll
>>> 
>>> http://www.lucidimagination.com/
>>> 
>>> 
>>> 
>>> Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>> 
>>> 
>> 
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


Mime
View raw message