lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <...@csh.rit.edu>
Subject Re: about PDF / HTML index
Date Wed, 16 Jul 2003 10:29:46 GMT

PDFBox comes with the class
org.pdfbox.searchengine.lucene.LucenePDFDocument which shows how to
parse /index a pdf document.

Ben


On Tue, 15 Jul 2003, alvaro z wrote:

>
> im using lucene with TXT and HTML files , its working.
>
> the only problem with HTML files is that i have to index html files as txt first , before
to index them as HTML.
>
> do anyone have try to index pdf files ?
>
> im trying the pdfbox , is there any samples for indexing pdf files ? (i dont find any
samples to do that) with any of the parsers (pdfbox, jpedal ,etc).
>
> thanks for helping,
>
> Alvaro. from Lima - Peru
>
>
> ---------------------------------
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message