cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Martin" <jeff.mar...@dial.pipex.com>
Subject Re: MS Word Producer
Date Thu, 06 Apr 2000 20:27:44 GMT
Thanks for that. I definitely take a look at these.

-----Original Message-----
From: Ross Burton <ross.burton@mail.com>
To: cocoon-dev@xml.apache.org <cocoon-dev@xml.apache.org>
Date: 06 April 2000 18:07
Subject: Re: MS Word Producer


>> Just wondering if anyone had ever tried creating XML documents on the fly
>> from MS Word documents.
>> I've got a load of word documents which I want to give people web access
>to
>> browse whilst keeping the word documents as the master.
>
>> I was thinking of using COM/Word Basic to get at the document structure
>and
>> produce an basic XML version of the document structure. This can then be
>> placed into a cocoon producer and then passed on into a more usable
>document
>> type (e.g. PDF, HTML). I know that I'll probably lose a lot of the
>structure
>> of the document but as long as I can get the general info across I'm not
>too
>> worried.
>
>Might be slow unless you batch the generation - easy enough I suppose with
>Task Scheduler and Windows Scripting.
>
>For dynamic generation - you could look at the wv library. It was known as
>mswordview and according to the blurb exports Word files to HTML.
Actually,
>it exports to a weird XML format which is processed into HTML, LaTeX, LyX
>etc.  It's a bit kludgy (the use of XML is rather... interesting) but in
>theory a straight XML output is possible.  I was planning on writing this
>sort of thing, but there are zero docs on the Word file format (the
official
>docs are incorrect!).  I investigated, and it appears that there are
several
>OLE file readers (IBM has one called DocFile in Alphaworks) so the hard bit
>has been done, a Word reader converted from one of the many C should be
>possible.
>
>> Hmmm... Doesn't Office2000 exports all the stuff in XML????
>
>It was going to - anybody with Office 2000 know if this is it's native
>format?  I guess it a compressed form too, I'm willing to put money on the
>fact that it is _not_ gzip.  :-)
>
>Ross Burton
>


Mime
View raw message