corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kelly <>
Subject Re: ODF filter
Date Thu, 08 Jan 2015 15:59:47 GMT
> On 8 Jan 2015, at 10:16 am, Dave Fisher <> wrote:
> Hi Peter,
> This is a helpful email from your concrete discussion I can better understand the mapping
between the abstract / HTML model and the concrete / DOCX, ODT.
> You mention differences in the style runs for Word and ODT of which I am familiar from
the OOXML side. Does the abstract model / HTML take a particular approach towards style runs?
Is there a concrete version of the HTML model? Is there a specification or plan for the abstract

As a general principle, no - a given filter is expected to handle arbitrary HTML.

However, there is a function for “normalising” a HTML document to change nested sets of
inline elements (span, b, i, etc.) into a flat sequence of runs (each represented as a span
element). The Word filter uses this, due to Word’s flat model of inline runs.

ODF text documents, on the other hand, *do* support nested formatting runs, so when writing
this filter it may make sense not to apply the normalisation process used in the word filter.
This should be done if there is information that could not be represented in HTML and would
be lost by flattening the structure like we do for word.

There’s been a few times where the topic of what internal representation we should use has
been raised - whether we should stick with HTML, come up with our own entirely different model,
or something else. I personally think HTML is a good choice, but perhaps for those who have
raised the issue of an alternate intermediate form, this might be a good time to start that
discussion ;)

Dr Peter M. Kelly

PGP key: <>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message