lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Pdf in Lucene?
Date Sun, 07 Dec 2008 02:33:24 GMT

On Dec 4, 2008, at 9:50 AM, tiziano bernardi wrote:

>
> I entered your code inside a main.
> I have imported libraries required by mistake but me.
> First error:
>
> parser.parse();
>
> Syntax error on token "parse", Identifier expected after this token
> Second error:
> cd.close();
> Syntax error on token "close", Identifier expected after this token
> Third error:
> doc.add(new Field("content", text,Field.Store.NO <http: 
> \field.store.no>,
> Field.Index.TOKENIZED));

What is the <http:\field.store.no> stuff?  Seems like you've got a lot  
of junk in your file that isn't proper Java.

>
> Multiple markers at this line
> - Syntax error on tokens, delete these tokens
> - Syntax error on token "add", = expected after this token
> - Syntax error on token ",", delete this token
> - Syntax error on token(s), misplaced construct(s)
>> Date: Thu, 4 Dec 2008 19:17:01 +0530> From: kalanir@gmail.com> To: java-user@lucene.apache.org

>> > Subject: Re: Pdf in Lucene?> > Hi Tiziano,> > What is the error
 
>> you got? I think you can get the text easily using the> code shown  
>> below.> > > FileInputStream fi = new FileInputStream(new  
>> File("sample.pdf"));> > PDFParser parser = new PDFParser(fi);>  
>> parser.parse();> COSDocument cd = parser.getDocument();>  
>> PDFTextStripper stripper = new PDFTextStripper();> String text =  
>> stripper.getText(new PDDocument(cd));> cd.close();> > After getting  
>> the value for text you can simply create the Lucene document.> >  
>> Document doc = new Document();> doc.add(new Field("content",  
>> text,Field.Store.NO <http://field.store.no/>,>  
>> Field.Index.TOKENIZED));> >> >> >> >> > On Thu, Dec
4, 2008 at 6:20  
>> PM, tiziano bernardi <dk1982@hotmail.it>wrote:> >> >>> >>
Thanks  
>> very kind ...> >> But I've tried that code but I do not work ...>  
>> >> You could send me a simple working class that uses it please?>  
>> >> Thanks> Date: Thu, 4 Dec 2008 15:19:26 +0530> From: kalanir@gmail.com

>> >> >> To: java-user@lucene.apache.org> Subject: Re: Pdf in Lucene?>
 
>> > Hi,> > In> >> my case I used PDFBox, just to extract the text
 
>> from PDF document and> then> >> I created the Lucene document  
>> giving the extracted text. (I didn't use> the> >> PDFBox built in  
>> Lucene search engine). So I didn't get any> incompatibility> >>  
>> problems.> > This blog post shows the way.>> >> http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.html

>> >> >> > It worked perfect for me.> > Thanks.> >>  
>> _________________________________________________________________>  
>> >> Ci sai fare con l'italiano? Scoprilo con Typectionary!> >> http://typectionary.it.msn.com/

>> > >>> >> >> >> > --> > Kalani Ruwanpathirana>
> Department of  
>> Computer Science & Engineering> > University of Moratuwa> >> >
> >  
>> -- > Kalani Ruwanpathirana> Department of Computer Science &  
>> Engineering> University of Moratuwa
> _________________________________________________________________
> Fanne di tutti i colori, personalizza la tua Hotmail!
> http://imagine-windowslive.com/Hotmail/#0

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message