lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alvaro z <>
Subject about PDF / HTML index
Date Tue, 15 Jul 2003 22:21:03 GMT

im using lucene with TXT and HTML files , its working.

the only problem with HTML files is that i have to index html files as txt first , before
to index them as HTML.

do anyone have try to index pdf files ? 

im trying the pdfbox , is there any samples for indexing pdf files ? (i dont find any samples
to do that) with any of the parsers (pdfbox, jpedal ,etc).

thanks for helping,

Alvaro. from Lima - Peru

Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message