lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adriano Labate" <>
Subject RE : Parsers
Date Wed, 28 May 2003 12:03:27 GMT
The text extractors work very well for Word and pdf
They use both PDFBox and POI.

For Excel, using POI directly is very easy. Tell me if you want to see
code samples.

I'm looking myself for a Powerpoint text extractor, if you know one...

Adriano Labate

-----Message d'origine-----
De : Pete Lewis [] 
Envoyé : mercredi, 28 mai 2003 12:48
À : Lucene Users List
Objet : Parsers

Hi all,

I have a rather nice html parser that I got from SourceForge.  Does
anyone know of any good parsers for pdf and Microsoft Office Suite
(.doc, .ppt, .xls, etc), any help would be much appreciated.

Pete Lewis

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message