lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Victor Hadianto <vict...@nuix.com.au>
Subject Re: RE : Parsers
Date Wed, 28 May 2003 23:01:09 GMT
> The www.textmining.org text extractors work very well for Word and pdf
> documents.
> They use both PDFBox and POI.
>
> For Excel, using POI directly is very easy. Tell me if you want to see
> code samples.
>
> I'm looking myself for a Powerpoint text extractor, if you know one...

Another solution is to use Microsoft Office itself. You can setup a server 
that serve request to convert Microsoft Office doc. There are many ways of 
doing this, for example using Python to directly call Office then put your 
python script in a webserver.

Or you can set a .Net conversion server and you can call this .Net service 
using a Web Service, and many other interesting technique.

victor


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message