Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 58616 invoked from network); 1 Dec 2008 14:18:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Dec 2008 14:18:44 -0000 Received: (qmail 34916 invoked by uid 500); 1 Dec 2008 14:18:48 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 34893 invoked by uid 500); 1 Dec 2008 14:18:48 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 34882 invoked by uid 99); 1 Dec 2008 14:18:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Dec 2008 06:18:48 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dk1982@hotmail.it designates 65.55.34.81 as permitted sender) Received: from [65.55.34.81] (HELO col0-omc2-s7.col0.hotmail.com) (65.55.34.81) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Dec 2008 14:17:17 +0000 Received: from COL113-W45 ([65.55.34.72]) by col0-omc2-s7.col0.hotmail.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 1 Dec 2008 06:18:04 -0800 Message-ID: Content-Type: multipart/alternative; boundary="_067b727a-d978-4a30-b550-52072f54aa8a_" X-Originating-IP: [88.51.233.126] From: tiziano bernardi To: java lucene domande Subject: RE: Pdf in Lucene? Date: Mon, 1 Dec 2008 15:18:04 +0100 Importance: Normal In-Reply-To: References: <8c4e68610812010343n62d7380bv52a224d7fefe6a2a@mail.gmail.com> MIME-Version: 1.0 X-OriginalArrivalTime: 01 Dec 2008 14:18:04.0156 (UTC) FILETIME=[9FCBC3C0:01C953BF] X-Virus-Checked: Checked by ClamAV on apache.org --_067b727a-d978-4a30-b550-52072f54aa8a_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable this is my class=2C I use eclipse and I haven't any errors.Do not understan= d where the problem .... =20 =20 import java.io.File=3B import java.io.IOException=3B import org.apache.lucene.analysis.Analyzer=3B import org.apache.lucene.analysis.standard.StandardAnalyzer=3B import org.apache.lucene.document.Document=3B import org.apache.lucene.index.IndexWriter=3B import org.apache.lucene.index.Term=3B import org.apache.lucene.search.Hits=3B import org.apache.lucene.search.IndexSearcher=3B import org.apache.lucene.search.Query=3B import org.apache.lucene.search.TermQuery=3B import org.apache.lucene.store.Directory=3B import org.apache.lucene.store.RAMDirectory=3B import org.pdfbox.searchengine.lucene.LucenePDFDocument=3B public final class SimplePdfSearch=20 { private static final String PDF_FILE_PATH =3D "C:\\Users\\Tiziano\\Desktop\= \doc_di_prova\\prova.pdf"=3B private static final String SEARCH_TERM =3D "prova"=3B public static final void main(String[] args) throws IOException { Directory directory =3D null=3B try { File pdfFile =3D new File(PDF_FILE_PATH)=3B Document document =3D LucenePDFDocument.getDocument(pdfFile)=3B directory =3D new RAMDirectory()=3B IndexWriter indexWriter =3D null=3B try { Analyzer analyzer =3D new StandardAnalyzer()=3B indexWriter =3D new IndexWriter(directory=2C analyzer=2C true)=3B indexWriter.addDocument(document)=3B } finally { if (indexWriter !=3D null) { try { indexWriter.close()=3B } catch (IOException ignore) { // Ignore } indexWriter =3D null=3B } } IndexSearcher indexSearcher =3D null=3B try { indexSearcher =3D new IndexSearcher(directory)=3B Term term =3D new Term("contents"=2C SEARCH_TERM)=3B Query query =3D new TermQuery(term)=3B Hits hits =3D indexSearcher.search(query)=3B System.out.println((hits.length() !=3D 0) ? "Found" : "Not Found")=3B } finally { if (indexSearcher !=3D null) { try { indexSearcher.close()=3B } catch (IOException ignore) { // Ignore } indexSearcher =3D null=3B } } } finally { if (directory !=3D null) { try { directory.close()=3B } catch (IOException ignore) { // Ignore } directory =3D null=3B } } } }> From: gsingers@apache.org> To: java-user@lucene.apache.org> Subject: Re:= Pdf in Lucene?> Date: Mon=2C 1 Dec 2008 08:22:58 -0500> > > On Dec 1=2C 20= 08=2C at 8:01 AM=2C tiziano bernardi wrote:> > >> > I tried to use pdfbox b= ut gives me an error.> > That the version of lucene and the pdfbox are inco= mpatible.> > Lucene knows nothing about PDFBox=2C so I don't see how they c= ould be > incompatible=2C unless your are referring to PDFBox's Lucene Docu= ment > creator=2C in which case=2C you should ask on the PDFBox mailing lis= t. I > think=2C however=2C that it's pretty straightforward to create a Luc= ene > document from PDFBox=2C so you shouldn't need to rely on their versio= n.> > Personally=2C I'd have a look at Tika (http://lucene.apache.org/tika)= =2C > which wraps PDFBox (and other extraction libraries) and gives you bac= k > SAX-like events via a ContentHandler=2C which you can then use to creat= e > Lucene documents. Else=2C I've been working on SOLR-284=2C which > inte= grates Tika into Solr=2C see https://issues.apache.org/jira/browse/SOLR-284= > > -Grant> > >> > I use pdf box 0.7.3 and lucene 2.1.0> Date: Mon=2C 1 Dec= 2008 11:43:00 > > +0000> From: ian.lea@gmail.com> To: java-user@lucene.apa= che.org> > > Subject: Re: Pdf in Lucene?> > Hi> > > Lucene only indexes tex= t so > > you'll have to get the text out of the PDF> and feed it to lucene.= > > > > Google for lucene pdf=2C or go straight to http://www.pdfbox.org/> = > > > > --> Ian.> > > > 2008/12/1 tiziano bernardi :> > = > >> >> > Hi=2C> > I want to index PDF files with lucene is possible?> > > = > What like?> > Thanks Tiziano Bernardi> > > > ____________________________= _____________________________________> > > > Fanne di tutti i colori=2C per= sonalizza la tua Hotmail!> > http://imagine-windowslive.com/Hotmail/#0 > > = > > > > -------------------------------------------------------------------= -- > > > To unsubscribe=2C e-mail: java-user-unsubscribe@lucene.apache.org>= > > For additional commands=2C e-mail: java-user-help@lucene.apache.org>> = > _________________________________________________________________> > 50 n= uovi schemi per giocare su CrossWire! Accetta la sfida!> > http://livesearc= h.games.msn.com/crosswire/play_it/> > --------------------------> Grant Ing= ersoll> > Lucene Helpful Hints:> http://wiki.apache.org/lucene-java/BasicsO= fPerformance> http://wiki.apache.org/lucene-java/LuceneFAQ> > > > > > > > >= > > > --------------------------------------------------------------------= -> To unsubscribe=2C e-mail: java-user-unsubscribe@lucene.apache.org> For a= dditional commands=2C e-mail: java-user-help@lucene.apache.org>=20 _________________________________________________________________ Vai oltre le parole=2C scarica il nuovo Messenger! http://download.live.com/?mkt=3Dit-it= --_067b727a-d978-4a30-b550-52072f54aa8a_--