openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kelly <>
Subject Re: OOXML
Date Sat, 02 Aug 2014 16:42:55 GMT
On 2 Aug 2014, at 9:24 pm, Alexandro Colorado <> wrote:

> The Support that is done is to receieve OOXML not to produce them, the
> discussion issue would be to support legacy formats like .doc or .xls.
> I still dont see a point to generate OOXML and most people dont care
> as long as they can send in office native formats.
> I never heard someone saying, please send it on docx, your doc is a
> closed binary format.

I (and I suspect I'm not alone) see a lack of the ability to 1) Save OOXML documents and 2)
Do so while preserving all elements, including unsupported features and Microsoft-only data
as being the #1 limitation to OpenOffice today. The fact is, OOXML is in practice extremely
widely used (vastly more so than ODF) and I argue that if OpenOffice is to have any relevance
going forward it must support it, and support it well.

The migration path in particular, which I mentioned previously, is not just about importing
files but enabling a period of a number of years during which an organisation can effectively
work with a mixture of OOXML and ODF documents. This allows the transition to be done incrementally
- a company with 30,000 employees will only migrate if there's a way they can do so bit-by-bit,
with some departments sticking with OOXML for longer than others. Because there will be people
in different departments that need to work together, those who insist on remaining with OOXML
for the time being must be able to collaborate in both directions with those who have switched
for all their other documents.

It's the same situation as the transition Microsoft made from the old binary formats to OOXML
- Office 2007 (and all later versions) still support the older formats, for both read and
write, and I expect they will continue for some time. If Office 2007 had completely dropped
support for saving .doc, .xls, and .ppt, it would have been dead-on-arrival, as it took several
years before most people were saving in the newer format by default.

Now there is still the question of how OpenOffice could go about supporting these formats.
There is already an import filter which sort-of works (though I had to direct a customer to
LibreOffice the other day as they were having trouble opening a perfectly-valid .docx using
OpenOffice). This could be left in place, with fixes where necessary, and a new export filter
written for saving. The problem with this however is that import/export is inherently a lossy
process; if there is any information within a document that is not supported by OO or the
filters, then it will be lost after an open/save. This information could also include proprietary
extension data that is supported by Office which there is no way to interpret since its format
is not published (macros, I believe, are an example of this).

The approach I took with UX Write was to use bidirectional transformation [1], which ensures
updates happen in a non-destructive manner. When you open a .docx file in UX Write, it converts
it into HTML, and keeps track of information that it allows it to map each HTML element back
to the original XML element in the .docx file from which it was generated. When you save the
file, instead of overwriting it with a new version, it *updates* the existing version by figuring
out what changes have occurred in the HTML document, and applying those changes to the original
.docx file. This way, only the parts that the user has actually modified are touched; anything
UX Write doesn't know about (e.g. embedded spreadsheets) is left untouched. I'm planning to
use the same design for ODF.

Crucially, this meant that I was able to implement support for OOXML (well, specifically the
WordProcessingML part of it) in an incremental fashion. First there was only support for editing
text; then came basic formatting, then lists, tables, styles etc. Even today, my implementation
doesn't have support for the complete feature set, but it is nonetheless able to "walk lightly"
in editing the document, by not touching anything that isn't supported. Coming back to the
migration path I mentioned above, whereby there is a need to be able to interoperate with
people using OOXML for some period of time, assuming they're eventually lead towards using
only ODF.

I'd be keen to hear any thoughts others have on this issue, in the sense of how best to tackle
it within OpenOffice.

I recommend having a look at the slides linked to below, which give a great introduction to
what bidirectional transformation is and how it works. There's been a ton of research been
done on this in the past, and I think it's ideal for dealing with different document formats,
particularly when a given app has treats a particular format as "native" (HTML in the case
of UX Write, ODF in the case of OpenOffice). With this approach, we could bypass an entire
class of compatibility problems where people complain of losing formatting or other information
from their documents (and blame it on OpenOffice, telling their collaborators to use Microsoft
Office instead).


Dr. Peter M. Kelly
Founder, UX Productivity

PGP key:
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

View raw message