lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tiziano bernardi <dk1...@hotmail.it>
Subject RE: Pdf in Lucene?
Date Mon, 01 Dec 2008 14:18:04 GMT


this is my class, I use eclipse and I haven't any errors.Do not understand where the problem
....
 
 
import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.pdfbox.searchengine.lucene.LucenePDFDocument;

public final class SimplePdfSearch 
{
private static final String PDF_FILE_PATH = "C:\\Users\\Tiziano\\Desktop\\doc_di_prova\\prova.pdf";
private static final String SEARCH_TERM = "prova";

public static final void main(String[] args) throws IOException
{
Directory directory = null;

try
{
File pdfFile = new File(PDF_FILE_PATH);
Document document = LucenePDFDocument.getDocument(pdfFile);

directory = new RAMDirectory();

IndexWriter indexWriter = null;

try
{
Analyzer analyzer = new StandardAnalyzer();
indexWriter = new IndexWriter(directory, analyzer, true);

indexWriter.addDocument(document);
}
finally
{
if (indexWriter != null)
{
try
{
indexWriter.close();
}
catch (IOException ignore)
{
// Ignore
}

indexWriter = null;
}
}

IndexSearcher indexSearcher = null;

try
{
indexSearcher = new IndexSearcher(directory);

Term term = new Term("contents", SEARCH_TERM);
Query query = new TermQuery(term);

Hits hits = indexSearcher.search(query);

System.out.println((hits.length() != 0) ? "Found" : "Not Found");
}
finally
{
if (indexSearcher != null)
{
try
{
indexSearcher.close();
}
catch (IOException ignore)
{
// Ignore
}

indexSearcher = null;
}
}
}
finally
{
if (directory != null)
{
try
{
directory.close();
}
catch (IOException ignore)
{
// Ignore
}

directory = null;
}
}
}
}> From: gsingers@apache.org> To: java-user@lucene.apache.org> Subject: Re: Pdf in
Lucene?> Date: Mon, 1 Dec 2008 08:22:58 -0500> > > On Dec 1, 2008, at 8:01 AM,
tiziano bernardi wrote:> > >> > I tried to use pdfbox but gives me an error.>
> That the version of lucene and the pdfbox are incompatible.> > Lucene knows nothing
about PDFBox, so I don't see how they could be > incompatible, unless your are referring
to PDFBox's Lucene Document > creator, in which case, you should ask on the PDFBox mailing
list. I > think, however, that it's pretty straightforward to create a Lucene > document
from PDFBox, so you shouldn't need to rely on their version.> > Personally, I'd have
a look at Tika (http://lucene.apache.org/tika), > which wraps PDFBox (and other extraction
libraries) and gives you back > SAX-like events via a ContentHandler, which you can then
use to create > Lucene documents. Else, I've been working on SOLR-284, which > integrates
Tika into Solr, see https://issues.apache.org/jira/browse/SOLR-284> > -Grant> >
>> > I use pdf box 0.7.3 and lucene 2.1.0> Date: Mon, 1 Dec 2008 11:43:00 >
> +0000> From: ian.lea@gmail.com> To: java-user@lucene.apache.org> > > Subject:
Re: Pdf in Lucene?> > Hi> > > Lucene only indexes text so > > you'll
have to get the text out of the PDF> and feed it to lucene.> > > > Google for
lucene pdf, or go straight to http://www.pdfbox.org/> > > > > --> Ian.>
> > > 2008/12/1 tiziano bernardi <dk1982@hotmail.it>:> > > >>
>> > Hi,> > I want to index PDF files with lucene is possible?> > >
> What like?> > Thanks Tiziano Bernardi> > > > _________________________________________________________________>
> > > Fanne di tutti i colori, personalizza la tua Hotmail!> > http://imagine-windowslive.com/Hotmail/#0
> > > > > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org> > >
For additional commands, e-mail: java-user-help@lucene.apache.org>> > _________________________________________________________________>
> 50 nuovi schemi per giocare su CrossWire! Accetta la sfida!> > http://livesearch.games.msn.com/crosswire/play_it/>
> --------------------------> Grant Ingersoll> > Lucene Helpful Hints:> http://wiki.apache.org/lucene-java/BasicsOfPerformance>
http://wiki.apache.org/lucene-java/LuceneFAQ> > > > > > > > > >
> > ---------------------------------------------------------------------> To unsubscribe,
e-mail: java-user-unsubscribe@lucene.apache.org> For additional commands, e-mail: java-user-help@lucene.apache.org>

_________________________________________________________________
Vai oltre le parole, scarica il nuovo Messenger!
http://download.live.com/?mkt=it-it
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message