lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <>
Subject ppt text extraction - Re: SearchBlox J2EE Search Component Version 1.2 released
Date Tue, 17 Feb 2004 13:53:11 GMT
Eric Jain wrote:

>>- Support for PowerPoint documents
>May I ask how you extract text from PowerPoint documents? Any open
>source tool, or your own code?

FYI I recently discovered "ppthtml" in this package:

Also "antiword" seems to work well for word docs.

Also also also....I use a utility from xpdf 
( for pdf text

When you get down to it, I have found that "portable c" tools (above) 
work better
than the pure java ones avail.  To be fair however I have found that POI 
does work fine
for XLS docs.

 - Dave

>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message