Eric Jain wrote:
>>- Support for PowerPoint documents
>>
>>
>
>May I ask how you extract text from PowerPoint documents? Any open
>source tool, or your own code?
>
>
FYI I recently discovered "ppthtml" in this package:
http://chicago.sourceforge.net/xlhtml/
Also "antiword" seems to work well for word docs.
Also also also....I use a utility from xpdf
(http://www.foolabs.com/xpdf/) for pdf text
extraction.
When you get down to it, I have found that "portable c" tools (above)
work better
than the pure java ones avail. To be fair however I have found that POI
does work fine
for XLS docs.
- Dave
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
|