corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis E. Hamilton" <dennis.hamil...@acm.org>
Subject RE: Word round trip issue? And round trip in general.
Date Mon, 06 Jul 2015 16:16:03 GMT
Commenting inline, ...

-----Original Message-----
From: Peter Kelly [mailto:pmkelly@apache.org] 
Sent: Monday, July 6, 2015 04:41
To: dev@corinthia.incubator.apache.org
Subject: Re: Word round trip issue? And round trip in general.
[ ... ]

This is one of the few instances where we actually completely replace something in the docx
file every time it is modified. The OPC specifies a set of XML files that indicate relationships
between different “parts” (i.e. files) in a package. They’re used as an alternative
to path names (I don’t know why, it seems unnecessary, but that’s how it’s done in OOXML).

<orcmid>
    The OPC structure allows the interdependencies among package
    parts to be known and managed without understanding the files  
    that have the dependencies.  Also, the OPC model is not 
    limited to Zip implementations so there is the prospect that 
    these would be mapped and represented on a server (for example)
    in quite different ways, and pull/push processing was thought
    to be aided by having the dependencies at the package level and
    also subdividing parts for more efficient interleaved access.
    Most of this is not used for OOXML, AFAIK, but the OPC design 
    Allowed for it.

    PS: The Office 2016 implementations are supporting concurrent
    shared editing when the documents are in a Microsoft cloud
    service, such as OneDrive, and that makes the server-side
    storage and protocols for its access interesting too.  I have
    no idea what they are, just that MSFT is moving rapidly to
    enable this sort of thing.
</orcmid>

I think there’s two likely possibilities:

1. OpenOffice is too strict in what it accepts from the OPC relationship files, and handles
only a subset of possible valid relationships (presumably whatever MS Office writes out).

2. Corinthia is too liberal in writing out the relationships, in that it does so in a way
that, while accepted by MS Office and some other apps, isn’t strictly in accordance with
the spec.

I suspect it’s likely the former, but I’m not infallible and it could be the latter ;)

If you unzip a .docx file, have a look at the files in _rels and word/_rels - these are the
OPC files that would differ and are likely what OO for whatever reason is struggling with.

<orcmid>
    It is good to look at the OPC specification.  This is part 
    of the OOXML specification although it is designed for 
    independent use, and it is so used. The easy way to get 
    the spec is to download ECMA-376, latest edition, Part 2.
    See <http://www.ecma-international.org/publications/standards/Ecma-376.htm>.

    The original format (called Metro) was in fact for very 
    large final-form documents that were amenable to accessing 
    by pull requests from high-end publishing engines (the 
    format that became known, later as XPS).

    There are free-standing implementations of OPC processors, 
    including one in C on SourceForge.  A .net version has 
    been open-sourced and I assume that there is a Java version 
    at Apache POI.  I can't speak to their quality.  I can't 
    speak to the quality of the OpenOffice processing of 
    OPC-carried OOXML either.

    These might be the basis for tests and they might also be 
    useful sources of ideas for how to disassemble and reassemble
    OOXML documents.

    PS: There have been some changes in OPC, as there have been 
    in the OOXML specifications, so you may have to distinguish 
    documents that honor older specifications and others that 
    reflect breaking changes.  
</orcmid>

[ ... ]



Mime
View raw message