lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tiziano bernardi <dk1...@hotmail.it>
Subject RE: Pdf in Lucene?
Date Thu, 04 Dec 2008 14:50:34 GMT

I entered your code inside a main.
I have imported libraries required by mistake but me.
First error:

parser.parse();

Syntax error on token "parse", Identifier expected after this token
Second error:
cd.close();
Syntax error on token "close", Identifier expected after this token
Third error:
doc.add(new Field("content", text,Field.Store.NO <http:\field.store.no>,
Field.Index.TOKENIZED));
Multiple markers at this line
- Syntax error on tokens, delete these tokens
- Syntax error on token "add", = expected after this token
- Syntax error on token ",", delete this token
- Syntax error on token(s), misplaced construct(s)
> Date: Thu, 4 Dec 2008 19:17:01 +0530> From: kalanir@gmail.com> To: java-user@lucene.apache.org>
Subject: Re: Pdf in Lucene?> > Hi Tiziano,> > What is the error you got? I think
you can get the text easily using the> code shown below.> > > FileInputStream
fi = new FileInputStream(new File("sample.pdf"));> > PDFParser parser = new PDFParser(fi);>
parser.parse();> COSDocument cd = parser.getDocument();> PDFTextStripper stripper =
new PDFTextStripper();> String text = stripper.getText(new PDDocument(cd));> cd.close();>
> After getting the value for text you can simply create the Lucene document.> >
Document doc = new Document();> doc.add(new Field("content", text,Field.Store.NO <http://field.store.no/>,>
Field.Index.TOKENIZED));> >> >> >> >> > On Thu, Dec 4, 2008
at 6:20 PM, tiziano bernardi <dk1982@hotmail.it>wrote:> >> >>> >>
Thanks very kind ...> >> But I've tried that code but I do not work ...> >>
You could send me a simple working class that uses it please?> >> Thanks> Date:
Thu, 4 Dec 2008 15:19:26 +0530> From: kalanir@gmail.com>> >> To: java-user@lucene.apache.org>
Subject: Re: Pdf in Lucene?> > Hi,> > In> >> my case I used PDFBox, just
to extract the text from PDF document and> then> >> I created the Lucene document
giving the extracted text. (I didn't use> the> >> PDFBox built in Lucene search
engine). So I didn't get any> incompatibility> >> problems.> > This blog
post shows the way.>> >> http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.html>>
>> > It worked perfect for me.> > Thanks.> >> _________________________________________________________________>
>> Ci sai fare con l'italiano? Scoprilo con Typectionary!> >> http://typectionary.it.msn.com/>
>>> >> >> >> > --> > Kalani Ruwanpathirana> > Department
of Computer Science & Engineering> > University of Moratuwa> >> > >
> -- > Kalani Ruwanpathirana> Department of Computer Science & Engineering>
University of Moratuwa
_________________________________________________________________
Fanne di tutti i colori, personalizza la tua Hotmail!
http://imagine-windowslive.com/Hotmail/#0
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message