lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: RE : Parsers
Date Thu, 29 May 2003 08:03:55 GMT
Victor Hadianto wrote:
>>The text extractors work very well for Word and pdf
>>They use both PDFBox and POI.
>>For Excel, using POI directly is very easy. Tell me if you want to see
>>code samples.
>>I'm looking myself for a Powerpoint text extractor, if you know one...
> Another solution is to use Microsoft Office itself. You can setup a server 
> that serve request to convert Microsoft Office doc. There are many ways of 
> doing this, for example using Python to directly call Office then put your 
> python script in a webserver.
> Or you can set a .Net conversion server and you can call this .Net service 
> using a Web Service, and many other interesting technique.

I'm using successfully a combination of Office automation via Jawin 
(free Java/COM bridge) to convert PPT files. You need to learn a bit 
about the pseudo-object model of PowerPoint to properly convert various 
objects, but this information can be found at

Obviously I'd love to learn about an alternative, because then I could 
free my clients from dependance on Office... I already use POI to 
convert XLS and DOC files, and it works _very_ well.

Best regards,
Andrzej Bialecki

Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
FreeBSD developer (

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message