lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Pdf in Lucene?
Date Mon, 01 Dec 2008 22:40:12 GMT
I certainly don't either, since you haven't said what the actual  
exception is.  If I had to guess, though, I would say it is the line
  Document document = LucenePDFDocument.getDocument

And that the Lucene library expected by PDFBox is not the same version  
of Lucene you are using.  I would suggest not relying on PDFBox to  
create your document, and instead look at the PDFBox calls that you  
need to make to then create your Document.


On Dec 1, 2008, at 9:18 AM, tiziano bernardi wrote:

>
>
> this is my class, I use eclipse and I haven't any errors.Do not  
> understand where the problem ....
>
>
> import java.io.File;
> import java.io.IOException;
>
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.Hits;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.RAMDirectory;
> import org.pdfbox.searchengine.lucene.LucenePDFDocument;
>
> public final class SimplePdfSearch
> {
> private static final String PDF_FILE_PATH = "C:\\Users\\Tiziano\ 
> \Desktop\\doc_di_prova\\prova.pdf";
> private static final String SEARCH_TERM = "prova";
>
> public static final void main(String[] args) throws IOException
> {
> Directory directory = null;
>
> try
> {
> File pdfFile = new File(PDF_FILE_PATH);
> Document document = LucenePDFDocument.getDocument(pdfFile);
>
> directory = new RAMDirectory();
>
> IndexWriter indexWriter = null;
>
> try
> {
> Analyzer analyzer = new StandardAnalyzer();
> indexWriter = new IndexWriter(directory, analyzer, true);
>
> indexWriter.addDocument(document);
> }
> finally
> {
> if (indexWriter != null)
> {
> try
> {
> indexWriter.close();
> }
> catch (IOException ignore)
> {
> // Ignore
> }
>
> indexWriter = null;
> }
> }
>
> IndexSearcher indexSearcher = null;
>
> try
> {
> indexSearcher = new IndexSearcher(directory);
>
> Term term = new Term("contents", SEARCH_TERM);
> Query query = new TermQuery(term);
>
> Hits hits = indexSearcher.search(query);
>
> System.out.println((hits.length() != 0) ? "Found" : "Not Found");
> }
> finally
> {
> if (indexSearcher != null)
> {
> try
> {
> indexSearcher.close();
> }
> catch (IOException ignore)
> {
> // Ignore
> }
>
> indexSearcher = null;
> }
> }
> }
> finally
> {
> if (directory != null)
> {
> try
> {
> directory.close();
> }
> catch (IOException ignore)
> {
> // Ignore
> }
>
> directory = null;
> }
> }
> }
> }> From: gsingers@apache.org> To: java-user@lucene.apache.org>  
> Subject: Re: Pdf in Lucene?> Date: Mon, 1 Dec 2008 08:22:58 -0500> >  
> > On Dec 1, 2008, at 8:01 AM, tiziano bernardi wrote:> > >> > I  
> tried to use pdfbox but gives me an error.> > That the version of  
> lucene and the pdfbox are incompatible.> > Lucene knows nothing  
> about PDFBox, so I don't see how they could be > incompatible,  
> unless your are referring to PDFBox's Lucene Document > creator, in  
> which case, you should ask on the PDFBox mailing list. I > think,  
> however, that it's pretty straightforward to create a Lucene >  
> document from PDFBox, so you shouldn't need to rely on their  
> version.> > Personally, I'd have a look at Tika (http://lucene.apache.org/tika

> ), > which wraps PDFBox (and other extraction libraries) and gives  
> you back > SAX-like events via a ContentHandler, which you can then  
> use to create > Lucene documents. Else, I've been working on  
> SOLR-284, which > integrates Tika into Solr, see https://issues.apache.org/jira/browse/SOLR-284

> > > -Grant> > >> > I use pdf box 0.7.3 and lucene 2.1.0> Date:
Mon,  
> 1 Dec 2008 11:43:00 > > +0000> From: ian.lea@gmail.com> To: java-user@lucene.apache.org

> > > > Subject: Re: Pdf in Lucene?> > Hi> > > Lucene only indexes
 
> text so > > you'll have to get the text out of the PDF> and feed it  
> to lucene.> > > > Google for lucene pdf, or go straight to http://www.pdfbox.org/

> > > > > > --> Ian.> > > > 2008/12/1 tiziano bernardi <dk1982@hotmail.it

> >:> > > >> >> > Hi,> > I want to index PDF files with
lucene is  
> possible?> > > > What like?> > Thanks Tiziano Bernardi> > >
>  
> _________________________________________________________________> >  
> > > Fanne di tutti i colori, personalizza la tua Hotmail!> > http://imagine-windowslive.com/Hotmail/#0

>  > > > > > >  
> --------------------------------------------------------------------- > 
>  > > To unsubscribe, e-mail: java-user- 
> unsubscribe@lucene.apache.org> > > For additional commands, e-mail: java-user-help@lucene.apache.org

> >> >  
> _________________________________________________________________> >  
> 50 nuovi schemi per giocare su CrossWire! Accetta la sfida!> > http://livesearch.games.msn.com/crosswire/play_it/

> > > --------------------------> Grant Ingersoll> > Lucene Helpful  
> Hints:> http://wiki.apache.org/lucene-java/BasicsOfPerformance> http://wiki.apache.org/lucene-java/LuceneFAQ

> > > > > > > > > > > > >  
> --------------------------------------------------------------------- 
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org>  
> For additional commands, e-mail: java-user-help@lucene.apache.org>
> _________________________________________________________________
> Vai oltre le parole, scarica il nuovo Messenger!
> http://download.live.com/?mkt=it-it

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message