lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Pdf in Lucene?
Date Mon, 01 Dec 2008 13:22:58 GMT

On Dec 1, 2008, at 8:01 AM, tiziano bernardi wrote:

>
> I tried to use pdfbox but gives me an error.
> That the version of lucene and the pdfbox are incompatible.

Lucene knows nothing about PDFBox, so I don't see how they could be  
incompatible, unless your are referring to PDFBox's Lucene Document  
creator, in which case, you should ask on the PDFBox mailing list.  I  
think, however, that it's pretty straightforward to create a Lucene  
document from PDFBox, so you shouldn't need to rely on their version.

Personally, I'd have a look at Tika (http://lucene.apache.org/tika),  
which wraps PDFBox (and other extraction libraries) and gives you back  
SAX-like events via a ContentHandler, which you can then use to create  
Lucene documents.  Else, I've been working on SOLR-284, which  
integrates Tika into Solr, see https://issues.apache.org/jira/browse/SOLR-284

-Grant

>
> I use pdf box 0.7.3 and lucene 2.1.0> Date: Mon, 1 Dec 2008 11:43:00  
> +0000> From: ian.lea@gmail.com> To: java-user@lucene.apache.org>  
> Subject: Re: Pdf in Lucene?> > Hi> > > Lucene only indexes text so  
> you'll have to get the text out of the PDF> and feed it to lucene.>  
> > Google for lucene pdf, or go straight to http://www.pdfbox.org/> >  
> > --> Ian.> > > > 2008/12/1 tiziano bernardi <dk1982@hotmail.it>:>
 
> >> >> > Hi,> > I want to index PDF files with lucene is possible?>
>  
> What like?> > Thanks Tiziano Bernardi> >  
> _________________________________________________________________> >  
> Fanne di tutti i colori, personalizza la tua Hotmail!> > http://imagine-windowslive.com/Hotmail/#0

> > >  
> --------------------------------------------------------------------- 
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org>  
> For additional commands, e-mail: java-user-help@lucene.apache.org>
> _________________________________________________________________
> 50 nuovi schemi per giocare su CrossWire! Accetta la sfida!
> http://livesearch.games.msn.com/crosswire/play_it/

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message