lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tiziano bernardi <dk1...@hotmail.it>
Subject RE: Pdf in Lucene?
Date Tue, 02 Dec 2008 09:03:27 GMT


This is the exception:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.lucene.document.Document.add(Lorg/apache/lucene/document/Field;)V
at org.pdfbox.searchengine.lucene.LucenePDFDocument.addUnindexedField(LucenePDFDocument.java:224)
at org.pdfbox.searchengine.lucene.LucenePDFDocument.convertDocument(LucenePDFDocument.java:265)
at org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDocument.java:377)
at SimplePdfSearch.main(SimplePdfSearch.java:30)
 
I thank you for the time you spent
> From: gsingers@apache.org> To: java-user@lucene.apache.org> Subject: Re: Pdf in
Lucene?> Date: Mon, 1 Dec 2008 17:40:12 -0500> > I certainly don't either, since
you haven't said what the actual > exception is. If I had to guess, though, I would say
it is the line> Document document = LucenePDFDocument.getDocument> > And that the
Lucene library expected by PDFBox is not the same version > of Lucene you are using. I
would suggest not relying on PDFBox to > create your document, and instead look at the
PDFBox calls that you > need to make to then create your Document.> > > On Dec
1, 2008, at 9:18 AM, tiziano bernardi wrote:> > >> >> > this is my class,
I use eclipse and I haven't any errors.Do not > > understand where the problem ....>
>> >> > import java.io.File;> > import java.io.IOException;> >>
> import org.apache.lucene.analysis.Analyzer;> > import org.apache.lucene.analysis.standard.StandardAnalyzer;>
> import org.apache.lucene.document.Document;> > import org.apache.lucene.index.IndexWriter;>
> import org.apache.lucene.index.Term;> > import org.apache.lucene.search.Hits;>
> import org.apache.lucene.search.IndexSearcher;> > import org.apache.lucene.search.Query;>
> import org.apache.lucene.search.TermQuery;> > import org.apache.lucene.store.Directory;>
> import org.apache.lucene.store.RAMDirectory;> > import org.pdfbox.searchengine.lucene.LucenePDFDocument;>
>> > public final class SimplePdfSearch> > {> > private static final
String PDF_FILE_PATH = "C:\\Users\\Tiziano\ > > \Desktop\\doc_di_prova\\prova.pdf";>
> private static final String SEARCH_TERM = "prova";> >> > public static final
void main(String[] args) throws IOException> > {> > Directory directory = null;>
>> > try> > {> > File pdfFile = new File(PDF_FILE_PATH);> > Document
document = LucenePDFDocument.getDocument(pdfFile);> >> > directory = new RAMDirectory();>
>> > IndexWriter indexWriter = null;> >> > try> > {> > Analyzer
analyzer = new StandardAnalyzer();> > indexWriter = new IndexWriter(directory, analyzer,
true);> >> > indexWriter.addDocument(document);> > }> > finally>
> {> > if (indexWriter != null)> > {> > try> > {> > indexWriter.close();>
> }> > catch (IOException ignore)> > {> > // Ignore> > }> >>
> indexWriter = null;> > }> > }> >> > IndexSearcher indexSearcher
= null;> >> > try> > {> > indexSearcher = new IndexSearcher(directory);>
>> > Term term = new Term("contents", SEARCH_TERM);> > Query query = new TermQuery(term);>
>> > Hits hits = indexSearcher.search(query);> >> > System.out.println((hits.length()
!= 0) ? "Found" : "Not Found");> > }> > finally> > {> > if (indexSearcher
!= null)> > {> > try> > {> > indexSearcher.close();> > }>
> catch (IOException ignore)> > {> > // Ignore> > }> >> >
indexSearcher = null;> > }> > }> > }> > finally> > {> >
if (directory != null)> > {> > try> > {> > directory.close();>
> }> > catch (IOException ignore)> > {> > // Ignore> > }> >>
> directory = null;> > }> > }> > }> > }> From: gsingers@apache.org>
To: java-user@lucene.apache.org> > > Subject: Re: Pdf in Lucene?> Date: Mon, 1
Dec 2008 08:22:58 -0500> > > > > On Dec 1, 2008, at 8:01 AM, tiziano bernardi
wrote:> > >> > I > > tried to use pdfbox but gives me an error.> >
That the version of > > lucene and the pdfbox are incompatible.> > Lucene knows
nothing > > about PDFBox, so I don't see how they could be > incompatible, > >
unless your are referring to PDFBox's Lucene Document > creator, in > > which case,
you should ask on the PDFBox mailing list. I > think, > > however, that it's pretty
straightforward to create a Lucene > > > document from PDFBox, so you shouldn't need
to rely on their > > version.> > Personally, I'd have a look at Tika (http://lucene.apache.org/tika
> > ), > which wraps PDFBox (and other extraction libraries) and gives > >
you back > SAX-like events via a ContentHandler, which you can then > > use to create
> Lucene documents. Else, I've been working on > > SOLR-284, which > integrates
Tika into Solr, see https://issues.apache.org/jira/browse/SOLR-284 > > > > -Grant>
> >> > I use pdf box 0.7.3 and lucene 2.1.0> Date: Mon, > > 1 Dec 2008
11:43:00 > > +0000> From: ian.lea@gmail.com> To: java-user@lucene.apache.org >
> > > > Subject: Re: Pdf in Lucene?> > Hi> > > Lucene only indexes
> > text so > > you'll have to get the text out of the PDF> and feed it >
> to lucene.> > > > Google for lucene pdf, or go straight to http://www.pdfbox.org/
> > > > > > > --> Ian.> > > > 2008/12/1 tiziano bernardi
<dk1982@hotmail.it > > >:> > > >> >> > Hi,> > I
want to index PDF files with lucene is > > possible?> > > > What like?>
> Thanks Tiziano Bernardi> > > > > > _________________________________________________________________>
> > > > > Fanne di tutti i colori, personalizza la tua Hotmail!> > http://imagine-windowslive.com/Hotmail/#0
> > > > > > > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user- > > unsubscribe@lucene.apache.org>
> > For additional commands, e-mail: java-user-help@lucene.apache.org > > >>
> > > _________________________________________________________________> >
> > 50 nuovi schemi per giocare su CrossWire! Accetta la sfida!> > http://livesearch.games.msn.com/crosswire/play_it/
> > > > --------------------------> Grant Ingersoll> > Lucene Helpful
> > Hints:> http://wiki.apache.org/lucene-java/BasicsOfPerformance> http://wiki.apache.org/lucene-java/LuceneFAQ
> > > > > > > > > > > > > > > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org> > >
For additional commands, e-mail: java-user-help@lucene.apache.org>> > _________________________________________________________________>
> Vai oltre le parole, scarica il nuovo Messenger!> > http://download.live.com/?mkt=it-it>
> --------------------------> Grant Ingersoll> > Lucene Helpful Hints:> http://wiki.apache.org/lucene-java/BasicsOfPerformance>
http://wiki.apache.org/lucene-java/LuceneFAQ> > > > > > > > > >
> > ---------------------------------------------------------------------> To unsubscribe,
e-mail: java-user-unsubscribe@lucene.apache.org> For additional commands, e-mail: java-user-help@lucene.apache.org>

_________________________________________________________________
Vai oltre le parole, scarica il nuovo Messenger!
http://download.live.com/?mkt=it-it
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message