cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ross Burton" <>
Subject Re: MS Word Producer
Date Thu, 06 Apr 2000 14:25:22 GMT
> Just wondering if anyone had ever tried creating XML documents on the fly
> from MS Word documents.
> I've got a load of word documents which I want to give people web access
> browse whilst keeping the word documents as the master.

> I was thinking of using COM/Word Basic to get at the document structure
> produce an basic XML version of the document structure. This can then be
> placed into a cocoon producer and then passed on into a more usable
> type (e.g. PDF, HTML). I know that I'll probably lose a lot of the
> of the document but as long as I can get the general info across I'm not
> worried.

Might be slow unless you batch the generation - easy enough I suppose with
Task Scheduler and Windows Scripting.

For dynamic generation - you could look at the wv library. It was known as
mswordview and according to the blurb exports Word files to HTML.  Actually,
it exports to a weird XML format which is processed into HTML, LaTeX, LyX
etc.  It's a bit kludgy (the use of XML is rather... interesting) but in
theory a straight XML output is possible.  I was planning on writing this
sort of thing, but there are zero docs on the Word file format (the official
docs are incorrect!).  I investigated, and it appears that there are several
OLE file readers (IBM has one called DocFile in Alphaworks) so the hard bit
has been done, a Word reader converted from one of the many C should be

> Hmmm... Doesn't Office2000 exports all the stuff in XML????

It was going to - anybody with Office 2000 know if this is it's native
format?  I guess it a compressed form too, I'm willing to put money on the
fact that it is _not_ gzip.  :-)

Ross Burton

View raw message