lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: RE : Parsers
Date Thu, 29 May 2003 08:03:55 GMT
Victor Hadianto wrote:
>>The www.textmining.org text extractors work very well for Word and pdf
>>documents.
>>They use both PDFBox and POI.
>>
>>For Excel, using POI directly is very easy. Tell me if you want to see
>>code samples.
>>
>>I'm looking myself for a Powerpoint text extractor, if you know one...
> 
> 
> Another solution is to use Microsoft Office itself. You can setup a server 
> that serve request to convert Microsoft Office doc. There are many ways of 
> doing this, for example using Python to directly call Office then put your 
> python script in a webserver.
> 
> Or you can set a .Net conversion server and you can call this .Net service 
> using a Web Service, and many other interesting technique.

I'm using successfully a combination of Office automation via Jawin 
(free Java/COM bridge) to convert PPT files. You need to learn a bit 
about the pseudo-object model of PowerPoint to properly convert various 
objects, but this information can be found at msdn.microsoft.com.

Obviously I'd love to learn about an alternative, because then I could 
free my clients from dependance on Office... I already use POI to 
convert XLS and DOC files, and it works _very_ well.


-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message