openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kelly <>
Subject Change tracking & versioning (was Re: OOXML)
Date Sun, 03 Aug 2014 11:55:59 GMT
On 3 Aug 2014, at 3:05 am, Dennis E. Hamilton <> wrote:

> In line with the sketch that Peter Kelley provides below, I am personally very sympathetic
to the idea of having an internal model that can tolerate difference in format between input
and output while preserving in the output everything from the input format it can, even by
leaving markers that will be useful on future input of the produced form.  (There is a well-known
case of Microsoft Office doing this for HTML it exports, although the added information for
recovery of the MSO rendition led to many complaints about document bloat.)
> There are some conflicts between the desire to do this and the fact that some alterations
have non-local consequences and may have other effects.  I still support the idea, but there
are some tricky cases, including
> - Changes that overlap/conflict with tracked changes but tracked changes are not updated/preserved

I'm probably getting a bit off-topic here, but this issue is one of the reasons I advocate
an approach that keeps change tracking information separate from the content itself, rather
than part of it. In my mind, Git provides the perfect model for this, although integrating
it (or something else based on a similar model) into a word processor or office suite remains,
shall we say, a rather significant problem to solve, both in the sense of the theoretical
model and how that would be exposed in a user interface.

By itself, keeping the change information separate wouldn't solve the problem of inconsistency
when the file is modified by an implementation with no knowledge of change tracking information.
However, with a data model based on that of a version control system, that is able to access
the previous version of the file as well as the current one, find the differences between
the two, and allow the user to apply those differences, this could be addressed.

Let's say, just as a mental exercise, that we were to embed a git repository directly within
an ODF file. That is, the .odt file is a zip archive containing the usual content.xml, styles.xml
etc and also has a .git directory inside it, which contains the complete revision history
of all these separate files. When you save the document in an implementation that does not
support any change tracking/versioning, it would just overwrite the XML files in the same
way as a text editor writes a file to disk. When you save the document in an implementation
that *does* support this however, it overwrites the files and *then* does a git commit.

With this approach, if you were to first create a file in implementation A which supports
this versioning, you'd have a zip file with a git repository and one or more commits, and
the "working copy" (that is, all the files within the zip archive outside of the .git directory)
would be "clean" (up to date). If you then open and save it in implementation B which does
not support versioning, it would not touch the repository and leave the .git directory in
the zip file untouched, but instead save over the XML files. Then you open it in implementation
A again, and you can see that the working directory is not clean, and there are outstanding
changes. These could then be displayed in the editor in the same way as is done currently,
without the user noticing any difference. And you'd have the benefits of knowing the derivation
relationships between versions, so if you get two different versions of a document back that
have the same ancestor, you could do a merge.

Now I'm not suggesting that actually storing a git repository inside a .odt archive would
be a good way to go - partly for efficiency reasons (duplication of document's entire history
in every copy), and partly because its format is pure binary, and is so vastly different from
everything else in ODF. Nonetheless, at a theoretical level, the core idea - of storing a
version history separate from the content, from which changes can automatically be detected
without requiring any extensions to the core part of the standard itself - would I think be
worth exploring.

I know this is quite a different approach to what you've previously been considering; what
are your thoughts?

Dr. Peter M. Kelly
Founder, UX Productivity

PGP key:
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

View raw message