lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Praveen Agrawal <pkal...@gmail.com>
Subject Re: Problem with pdf, upgrading Cell
Date Wed, 05 May 2010 17:18:31 GMT
It reports that Jukka has resolved the issue (Tika-419), and now waiting for
Grant to verify (Solr-1902). But it seems the resolution will be available
in 0.8 version of Tika.

If it solves the problem, Is there a way to get it now? Any SVN trunk access
etc? All i see there is 0.7 src zip to download..

Thanks.
Praveen


On Tue, May 4, 2010 at 3:59 PM, Grant Ingersoll <gsingers@apache.org> wrote:

> Yes, it is loading the libraries, but they are in a different classloader
> that apparently the new way Tika loads doesn't have access to.
>
> -Grant
>
> On May 4, 2010, at 3:28 AM, Sandhya Agarwal wrote:
>
> > Hello,
> >
> >
> >
> > But I see that the libraries are being loaded :
> >
> >
> >
> > INFO: Adding specified lib dirs to ClassLoader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-compress-1.0.jar'
> to classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-logging-1.1.1.jar'
> to classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/dom4j-1.6.1.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/fontbox-1.1.0.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar'
> to classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/jempbox-1.1.0.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/log4j-1.2.14.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/metadata-extractor-2.4.0-beta-1.jar'
> to classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/pdfbox-1.1.0.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-3.6.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-3.6.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-schemas-3.6.jar'
> to classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-scratchpad-3.6.jar'
> to classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tagsoup-1.2.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-core-0.7.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-parsers-0.7.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xercesImpl-2.8.1.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xml-apis-1.0.b2.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xmlbeans-2.3.0.jar' to
> classloader
> >
> > May 4, 2010 12:50:16 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding 'file:/C:/apache-solr-1.4.0/dist/apache-solr-cell-1.4.0.jar'
> to classloader
> >
> > May 4, 2010 12:50:20 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/dist/apache-solr-clustering-1.4.0.jar' to
> classloader
> >
> > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/carrot2-mini-3.1.0.jar'
> to classloader
> >
> > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/commons-lang-2.4.jar' to
> classloader
> >
> > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/ehcache-1.6.2.jar' to
> classloader
> >
> > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/google-collections-1.0-rc2.jar'
> to classloader
> >
> > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/jackson-core-asl-0.9.9-6.jar'
> to classloader
> >
> > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/jackson-mapper-asl-0.9.9-6.jar'
> to classloader
> >
> > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/log4j-1.2.14.jar' to
> classloader
> >
> >
> >
> > Thanks,
> >
> > Sandhya
> >
> >
> >
> > -----Original Message-----
> > From: Grant Ingersoll [mailto:gsiasf@gmail.com] On Behalf Of Grant
> Ingersoll
> > Sent: Tuesday, May 04, 2010 6:13 AM
> > Cc: solr-user@lucene.apache.org
> > Subject: Re: Problem with pdf, upgrading Cell
> >
> >
> >
> > Little more info... Seems to be a classloading issue.  The tests pass,
> but they aren't loading the Tika libraries via the Solr ResourceLoader,
> whereas the example is.  Marc, one thing to try is to unjar the Solr WAR
> file and put the Tika libs in there, as I bet it will then work.  Note,
> however, I haven't tried this.
> >
> >
> >
> > On May 3, 2010, at 6:24 PM, Grant Ingersoll wrote:
> >
> >
> >
> >> I've opened https://issues.apache.org/jira/browse/SOLR-1902 to track
> this.  It is indeed a bug somewhere (still investigating).  It seems that
> Tika is now picking an EmptyParser implementation when trying to determine
> which parser to use, despite the fact that it properly identifies the MIME
> Type.
> >
> >>
> >
> >> -Grant
> >
> >>
> >
> >> On May 3, 2010, at 5:36 PM, Grant Ingersoll wrote:
> >
> >>
> >
> >>> I'm investigating.
> >
> >>>
> >
> >>> On May 3, 2010, at 5:17 AM, Marc Ghorayeb wrote:
> >
> >>>
> >
> >>>>
> >
> >>>> Hi,
> >
> >>>> Grant, i confirm what Praveen has said, any PDF i try does not work
> with the new Tika and SVN versions. :(
> >
> >>>> Marc
> >
> >>>>
> >
> >>>>> From: sagarwal@opentext.com
> >
> >>>>> To: solr-user@lucene.apache.org
> >
> >>>>> Date: Mon, 3 May 2010 13:05:24 +0530
> >
> >>>>> Subject: RE: Problem with pdf, upgrading Cell
> >
> >>>>>
> >
> >>>>> Hello,
> >
> >>>>>
> >
> >>>>> Please let me know if anybody figured out a way out of this issue.
> >
> >>>>>
> >
> >>>>> Thanks,
> >
> >>>>> Sandhya
> >
> >>>>>
> >
> >>>>> -----Original Message-----
> >
> >>>>> From: Praveen Agrawal [mailto:pkalwar@gmail.com]
> >
> >>>>> Sent: Friday, April 30, 2010 11:14 PM
> >
> >>>>> To: solr-user@lucene.apache.org
> >
> >>>>> Subject: Re: Problem with pdf, upgrading Cell
> >
> >>>>>
> >
> >>>>> Grant,
> >
> >>>>> You can try any of the sample pdfs that come in /docs folder of
Solr
> 1.4
> >
> >>>>> dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf'
etc.
> Only
> >
> >>>>> metadata i.e. stream_size, content_type apart from my own literals
> are
> >
> >>>>> indexed, and content is missing..
> >
> >>>>>
> >
> >>>>>
> >
> >>>>> On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll <
> gsingers@apache.org>wrote:
> >
> >>>>>
> >
> >>>>>> Praveen and Marc,
> >
> >>>>>>
> >
> >>>>>> Can you share the PDF (feel free to email my private email)
that
> fails in
> >
> >>>>>> Solr?
> >
> >>>>>>
> >
> >>>>>> Thanks,
> >
> >>>>>> Grant
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote:
> >
> >>>>>>
> >
> >>>>>>>
> >
> >>>>>>> Hi
> >
> >>>>>>> Nope i didn't get it to work... Just like you, command line
version
> of
> >
> >>>>>> tika extracts correctly the content, but once included in Solr,
no
> content
> >
> >>>>>> is extracted.
> >
> >>>>>>> What i tried until now is:- Updating the tika libraries
inside Solr
> 1.4
> >
> >>>>>> public version, no luck there.- Downloading the latest SVN version,
> compiled
> >
> >>>>>> it, and started from a simple schema, still no luck.- Getting
other
> versions
> >
> >>>>>> compiled on hudson (nightly builds), and testing them also,
still no
> >
> >>>>>> extraction.
> >
> >>>>>>> I sent a mail on the developpers mailing list but they told
me i
> should
> >
> >>>>>> just mail here, hope some developper reads this because it's
quite
> an
> >
> >>>>>> important feature of Solr and somehow it got broke between the
1.4
> release,
> >
> >>>>>> and the last version on the svn.
> >
> >>>>>>> Marc
> >
> >>>>>>> _________________________________________________________________
> >
> >>>>>>> Consultez gratuitement vos emails Orange, Gmail, Free, ...
> directement
> >
> >>>>>> dans HOTMAIL !
> >
> >>>>>>> http://www.windowslive.fr/hotmail/agregation/
> >
> >>>>>>
> >
> >>>>>> --------------------------
> >
> >>>>>> Grant Ingersoll
> >
> >>>>>> http://www.lucidimagination.com/
> >
> >>>>>>
> >
> >>>>>> Search the Lucene ecosystem using Solr/Lucene:
> >
> >>>>>> http://www.lucidimagination.com/search
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>
> >
> >>>> _________________________________________________________________
> >
> >>>> Hotmail et MSN dans la poche? HOTMAIL et MSN sont dispo gratuitement
> sur votre téléphone!
> >
> >>>> http://www.messengersurvotremobile.com/?d=Hotmail
> >
> >>>
> >
> >>> --------------------------
> >
> >>> Grant Ingersoll
> >
> >>> http://www.lucidimagination.com/
> >
> >>>
> >
> >>> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
> >
> >>>
> >
> >>
> >
> >> --------------------------
> >
> >> Grant Ingersoll
> >
> >> http://www.lucidimagination.com/
> >
> >>
> >
> >> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
> >
> >>
> >
> >
> >
> > --------------------------
> >
> > Grant Ingersoll
> >
> > http://www.lucidimagination.com/
> >
> >
> >
> > Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
> >
> >
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message