corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian C <>
Subject Re: What Google does to odf documents
Date Thu, 11 Jun 2015 01:56:15 GMT
On Thu, Jun 11, 2015 at 4:57 AM, Dennis E. Hamilton <
> wrote:

> Thanks, Ian.
> This is an important test procedure -- taking a test document and seeing
> not only how well it is accepted but also what is re-saved by a particular
> processor.

It is an interesting exercise to see how  the different document structures
are managed. Are there any rules or guidelines defined? Or is it simply the
case that as long as a document is valid with respect to the schema
definition then all is deemed to be okay?

> The Google Docs approach seems pretty ridiculous.  It has basically broken
> up the "Hello world" paragraph to have a paragraph style and to also have a
> differently-named automatic text style on each fragment: "Hello", " ", and
> "world".  That is pretty ridiculous.
> So these forms of documents will be encountered in the wild when a Google
> Doc is exported for interchange as an ODF document.  The bloat should be
> quite remarkable.  Not just in the text but in the definitions of automatic
> styles.

Yes, I tried it with other documents with more words and that is exactly
what happens. It is just weird. I reread the original hello world document
using OpenOffice and it is rendered ok. When the document is saved some
correction is applied, the automatic styles (all of which are the same!!)
are merged into one. So each word still has an automatic style but all the
same one.

I assume the other major players like Libre and Caligra which are all
derived from the same source tree will behave the same? Then again my kids
tell me the old joke about assume, making an ass out of u and me.

On a more direct note I wonder how we would process such a document? We are
not far enough along to generate css data from automatic styles yet but a
big document following that pattern would generate loads of useless css
rules. Suggesting we need to look at the content of the rules, compare and
optimise. A possible step for the future, baby steps first...

 - Dennis
> -----Original Message-----
> From: [] On Behalf Of Ian C
> Sent: Wednesday, June 10, 2015 04:09
> To: dev
> Subject: What Google does to odf documents
> Hi All,
> one of the things my tool does is compare the structure of documents.
> Different version of the same one as a user adds etc.
> I took a simple "Hello World" document and stored it in Google docs. I then
> downloaded it back again, no edits,  I then compared the two.
> They are radically different. Just consider the document body.
> Original....
>     <office:body>
>         <office:text>
>             <text:sequence-decls>
>                 <text:sequence-decl text:display-outline-level="0"
>                     text:name="Illustration" />
>                 <text:sequence-decl text:display-outline-level="0"
>                     text:name="Table" />
>                 <text:sequence-decl text:display-outline-level="0"
>                     text:name="Text" />
>                 <text:sequence-decl text:display-outline-level="0"
>                     text:name="Drawing" />
>             </text:sequence-decls>
>             <text:p text:style-name="Text_20_body">Hello world </text:p>
>         </office:text>
>     </office:body>
> When downloaded.
>     <office:body>
>         <office:text>
>             <text:p text:style-name="P1">
>                 <text:span text:style-name="T1_1">Hello</text:span>
>                 <text:span text:style-name="T1_2">
>                     <text:s />
>                 </text:span>
>                 <text:span text:style-name="T1_3">world</text:span>
>             </text:p>
>         </office:text>
>     </office:body>
> It lost the text-sequence-decls... no harm there. Not really sure what they
> were. But look at the simple text paragraph. It gets blown out to a span
> around each word with its own style! Even the space between the words has
> its own style!
> I'm sure there is some smart reason for this. I don't understand what it
> is.
> Let's hope we can do a better job with the round trip of a document in
> Corinthia,
> Then again maybe we will discover that is what we have to do?
> --
> Cheers,
> Ian C


Ian C

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message