On Dec 4, 2008, at 9:50 AM, tiziano bernardi wrote: > > I entered your code inside a main. > I have imported libraries required by mistake but me. > First error: > > parser.parse(); > > Syntax error on token "parse", Identifier expected after this token > Second error: > cd.close(); > Syntax error on token "close", Identifier expected after this token > Third error: > doc.add(new Field("content", text,Field.Store.NO \field.store.no>, > Field.Index.TOKENIZED)); What is the stuff? Seems like you've got a lot of junk in your file that isn't proper Java. > > Multiple markers at this line > - Syntax error on tokens, delete these tokens > - Syntax error on token "add", = expected after this token > - Syntax error on token ",", delete this token > - Syntax error on token(s), misplaced construct(s) >> Date: Thu, 4 Dec 2008 19:17:01 +0530> From: kalanir@gmail.com> To: java-user@lucene.apache.org >> > Subject: Re: Pdf in Lucene?> > Hi Tiziano,> > What is the error >> you got? I think you can get the text easily using the> code shown >> below.> > > FileInputStream fi = new FileInputStream(new >> File("sample.pdf"));> > PDFParser parser = new PDFParser(fi);> >> parser.parse();> COSDocument cd = parser.getDocument();> >> PDFTextStripper stripper = new PDFTextStripper();> String text = >> stripper.getText(new PDDocument(cd));> cd.close();> > After getting >> the value for text you can simply create the Lucene document.> > >> Document doc = new Document();> doc.add(new Field("content", >> text,Field.Store.NO ,> >> Field.Index.TOKENIZED));> >> >> >> >> > On Thu, Dec 4, 2008 at 6:20 >> PM, tiziano bernardi wrote:> >> >>> >> Thanks >> very kind ...> >> But I've tried that code but I do not work ...> >> >> You could send me a simple working class that uses it please?> >> >> Thanks> Date: Thu, 4 Dec 2008 15:19:26 +0530> From: kalanir@gmail.com >> >> >> To: java-user@lucene.apache.org> Subject: Re: Pdf in Lucene?> >> > Hi,> > In> >> my case I used PDFBox, just to extract the text >> from PDF document and> then> >> I created the Lucene document >> giving the extracted text. (I didn't use> the> >> PDFBox built in >> Lucene search engine). So I didn't get any> incompatibility> >> >> problems.> > This blog post shows the way.>> >> http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.html >> >> >> > It worked perfect for me.> > Thanks.> >> >> _________________________________________________________________> >> >> Ci sai fare con l'italiano? Scoprilo con Typectionary!> >> http://typectionary.it.msn.com/ >> > >>> >> >> >> > --> > Kalani Ruwanpathirana> > Department of >> Computer Science & Engineering> > University of Moratuwa> >> > > > >> -- > Kalani Ruwanpathirana> Department of Computer Science & >> Engineering> University of Moratuwa > _________________________________________________________________ > Fanne di tutti i colori, personalizza la tua Hotmail! > http://imagine-windowslive.com/Hotmail/#0 -------------------------- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org