incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis E. Hamilton" <dennis.hamil...@acm.org>
Subject Consequences of Working in Office Documents Here
Date Wed, 22 Jun 2011 02:20:13 GMT
BACK STORY

On a different list, not just here on ooo-dev, there has been some surprise to see us putting
binaries (ODF documents) into some SVN locations used by the PPMC. 

My impression is that the experienced hands here in ASF are expecting to see DIFFs in commit
messages on SVN, but binaries don't get DIFFed since it is usually unintelligible and almost
always uninteresting.  For some, it is new news that ODF packages are not XML files.

Someone suggested that one could unpack the Zip of these documents and then do diffs of the
respective XML parts and that could serve as a DIFF on what the changes are.  They also noticed
they'd never seen that done.

THE INSIGHT

On seeing that suggestion (clearly the kinds of things developers think of, it being what
we do), it struck me that we have a geeks are from Mars, users are from Venus situation here.

I think the clash of expectations has to do with the differences in tools that are applicable
at the level we work at, and how we see what it is we are at work on.

We need to understand that we really have different experience sets, and they all are important
in the context of the OpenOffice.org project.

A GEEKY LOOK

Here is a geeky explanation of why it does no good to figure out a better way to show DIFFs
of the XML inside an ODF package if you want to know what an author contributor/committer
changed.  (You might want that as a forensics tool, but not for knowing what someone changed
in the course of their work on a document.)

My (updated) explanation:

The problem is that diff-ing the XML is not what's wanted.  That's like decompiling two programs
and posting a diff of the assembly language.  (There are also binary blobs -- I said blogs
by mistake in another post -- in the Zipped ODF package.)

The level of abstraction that one cares about for accounting for changes in a document in
one of these formats is at the presentation or print-preview level.  There are document compare
utilities that provide such functions.  It's like the comparison you get between two wiki
pages.  It isn't shown as a comparison of the WikiText, but of the resulting presentation
anywhere I've looked.  (I know that on Apache we have a production process where we use SVN
as a publishing location and see diffs of Markdown a kind of plaintext markup.  I know that
fits beautifully into the source-code revision developer toolcraft model, but you wouldn't
want to know about changes in an ODF document that way, BECAUSE IT IS NOT WHAT IS AUTHORED.)

There are also change-tracking (historically called red-lining in my experience) provisions
in the ODF Format and the software products handle it to varying degrees of reliability. 
This is like showing a kind of merge with the removed text and the inserted text all shown
in the document and distinguished by highlighting and strikethroughs of various forms.  A
reviewer can agree to accept a change or can reject a change, make more changes, etc.

So there are (at least) two different levels of envisioning, of toolcraft and of work practices
among us.  At one level, there is the world of SVN, compiler and build processes, and source
code in simply-formatted text.  For ODF (and OOXML and more of these), the XML in the Zip
is object code, not the source code.  The source code counterpart is at quite another level.

Worlds are colliding here on Apache OpenOffice.org.  It is going to be very interesting what
we learn from each other and how we manage to function in some kind of shared culture within
the Apache Way.

Some of us navigate both levels with some fluency.  That is not the case for most of us and,
I am learning, not natural for me either: OpenOffice is not my tool of choice apart from using
it as an ODF forensic tool, and my development toolcraft is not SVN, LAMP, etc.

It is very important to grasp this, because if we don't recognize it, the authors of documentation
and people working at the user-issues level are going to be left with no way to fit in and
not much that feels like it is appropriate for their specialized activities.

 - Dennis



Mime
View raw message