lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Victor Hadianto <>
Subject Re: RE : Parsers
Date Wed, 28 May 2003 23:01:09 GMT
> The text extractors work very well for Word and pdf
> documents.
> They use both PDFBox and POI.
> For Excel, using POI directly is very easy. Tell me if you want to see
> code samples.
> I'm looking myself for a Powerpoint text extractor, if you know one...

Another solution is to use Microsoft Office itself. You can setup a server 
that serve request to convert Microsoft Office doc. There are many ways of 
doing this, for example using Python to directly call Office then put your 
python script in a webserver.

Or you can set a .Net conversion server and you can call this .Net service 
using a Web Service, and many other interesting technique.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message