lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Warnock <>
Subject Re: RE : Parsers
Date Thu, 29 May 2003 08:36:49 GMT

Another solution for all MS Office formats is to use the 
latest betas have a powerful Java SDK. So for example you could script a 
central copy to open MS Docs and save as html for parsing in lucene. Or 
you could save in formats (which are zipped xml) and 
throw those at lucene.

>> Another solution is to use Microsoft Office itself. You can setup a 
>> server that serve request to convert Microsoft Office doc. There are 
>> many ways of doing this, for example using Python to directly call 
>> Office then put your python script in a webserver.

David Warnock, Sundayta Ltd.
iDocSys for Document Management. VisibleResults for Fundraising.
Development and Hosting of Web Applications and Sites.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message