lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Warnock <da...@sundayta.com>
Subject Re: RE : Parsers
Date Thu, 29 May 2003 08:36:49 GMT
Andrzej,

Another solution for all MS Office formats is to use openoffice.org the 
latest betas have a powerful Java SDK. So for example you could script a 
central copy to open MS Docs and save as html for parsing in lucene. Or 
you could save in Openoffice.org formats (which are zipped xml) and 
throw those at lucene.

Dave
>> Another solution is to use Microsoft Office itself. You can setup a 
>> server that serve request to convert Microsoft Office doc. There are 
>> many ways of doing this, for example using Python to directly call 
>> Office then put your python script in a webserver.


-- 
David Warnock, Sundayta Ltd. http://www.sundayta.com
iDocSys for Document Management. VisibleResults for Fundraising.
Development and Hosting of Web Applications and Sites.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message