On 2 Aug 2014, at 9:24 pm, Alexandro Colorado <jza@oooes.org> wrote:

The Support that is done is to receieve OOXML not to produce them, the
discussion issue would be to support legacy formats like .doc or .xls.

I still dont see a point to generate OOXML and most people dont care
as long as they can send in office native formats.

I never heard someone saying, please send it on docx, your doc is a
closed binary format.

I (and I suspect I'm not alone) see a lack of the ability to 1) Save OOXML documents and 2) Do so while preserving all elements, including unsupported features and Microsoft-only data as being the #1 limitation to OpenOffice today. The fact is, OOXML is in practice extremely widely used (vastly more so than ODF) and I argue that if OpenOffice is to have any relevance going forward it must support it, and support it well.

The migration path in particular, which I mentioned previously, is not just about importing files but enabling a period of a number of years during which an organisation can effectively work with a mixture of OOXML and ODF documents. This allows the transition to be done incrementally - a company with 30,000 employees will only migrate if there's a way they can do so bit-by-bit, with some departments sticking with OOXML for longer than others. Because there will be people in different departments that need to work together, those who insist on remaining with OOXML for the time being must be able to collaborate in both directions with those who have switched for all their other documents.

It's the same situation as the transition Microsoft made from the old binary formats to OOXML - Office 2007 (and all later versions) still support the older formats, for both read and write, and I expect they will continue for some time. If Office 2007 had completely dropped support for saving .doc, .xls, and .ppt, it would have been dead-on-arrival, as it took several years before most people were saving in the newer format by default.

Now there is still the question of how OpenOffice could go about supporting these formats. There is already an import filter which sort-of works (though I had to direct a customer to LibreOffice the other day as they were having trouble opening a perfectly-valid .docx using OpenOffice). This could be left in place, with fixes where necessary, and a new export filter written for saving. The problem with this however is that import/export is inherently a lossy process; if there is any information within a document that is not supported by OO or the filters, then it will be lost after an open/save. This information could also include proprietary extension data that is supported by Office which there is no way to interpret since its format is not published (macros, I believe, are an example of this).

The approach I took with UX Write was to use bidirectional transformation [1], which ensures updates happen in a non-destructive manner. When you open a .docx file in UX Write, it converts it into HTML, and keeps track of information that it allows it to map each HTML element back to the original XML element in the .docx file from which it was generated. When you save the file, instead of overwriting it with a new version, it *updates* the existing version by figuring out what changes have occurred in the HTML document, and applying those changes to the original .docx file. This way, only the parts that the user has actually modified are touched; anything UX Write doesn't know about (e.g. embedded spreadsheets) is left untouched. I'm planning to use the same design for ODF.

Crucially, this meant that I was able to implement support for OOXML (well, specifically the WordProcessingML part of it) in an incremental fashion. First there was only support for editing text; then came basic formatting, then lists, tables, styles etc. Even today, my implementation doesn't have support for the complete feature set, but it is nonetheless able to "walk lightly" in editing the document, by not touching anything that isn't supported. Coming back to the migration path I mentioned above, whereby there is a need to be able to interoperate with people using OOXML for some period of time, assuming they're eventually lead towards using only ODF.

I'd be keen to hear any thoughts others have on this issue, in the sense of how best to tackle it within OpenOffice.

I recommend having a look at the slides linked to below, which give a great introduction to what bidirectional transformation is and how it works. There's been a ton of research been done on this in the past, and I think it's ideal for dealing with different document formats, particularly when a given app has treats a particular format as "native" (HTML in the case of UX Write, ODF in the case of OpenOffice). With this approach, we could bypass an entire class of compatibility problems where people complain of losing formatting or other information from their documents (and blame it on OpenOffice, telling their collaborators to use Microsoft Office instead).

[1] http://www.cis.upenn.edu/~bcpierce/papers/icmt-2009-slides.pdf

Dr. Peter M. Kelly
Founder, UX Productivity

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)