corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jan i <>
Subject Re: Corinthia Document Model (was RE: ODF filter)
Date Thu, 08 Jan 2015 17:02:26 GMT
On 8 January 2015 at 17:40, Dennis E. Hamilton <>

>  -- reply below to --
> From: jan i []
> Sent: Thursday, January 8, 2015 08:12
> To:
> Subject: Re: ODF filter
> On 8 January 2015 at 16:59, Peter Kelly <> wrote:
> [ ... ]
> > As a general principle, no - a given filter is expected to handle
> > arbitrary HTML.
> >
> > However, there is a function for “normalising” a HTML document to change
> > nested sets of inline elements (span, b, i, etc.) into a flat sequence of
> > runs (each represented as a span element). The Word filter uses this, due
> > to Word’s flat model of inline runs.
> >
> > ODF text documents, on the other hand, *do* support nested formatting
> > runs, so when writing this filter it may make sense not to apply the
> > normalisation process used in the word filter. This should be done if
> there
> > is information that could not be represented in HTML and would be lost by
> > flattening the structure like we do for word.
> >
> > There’s been a few times where the topic of what internal representation
> > we should use has been raised - whether we should stick with HTML, come
> up
> > with our own entirely different model, or something else. I personally
> > think HTML is a good choice, but perhaps for those who have raised the
> > issue of an alternate intermediate form, this might be a good time to
> start
> > that discussion ;)
> >
> Point taken, I am I assume the first who questioned it. But just to be
> precise, I am happy having HTML as the internal structure, but I am unhappy
> that filters can do what they like with the HTML. My goal is to define a
> set of access functions that filters should use to navigate/insert/delete
> tags and restrictions on what can be put in the tags. Just image one filter
> needs to id some tags, therefore uses id=, another filter needs to name
> some tags, therefore uses name=. If we are not careful here it will explode
> and reading HTML becomes nearly as complicated as reading the formats
> directly. We should have 1 and only 1 HTML definition, which the filters
> can use.
> rgds
> jan I.
> <orcmid>
>   I'm not following this well.
>   Let me ask it this way: Are we talking about fixing some sort of DOM over
>   the HTML5 or are we allowing arbitrary HTML5 and transforming to and from
>   it?
>   I am having trouble visualizing this process -- is the intermediate
>   concrete HTML and not some DOM view?

you are not the only one, it took me quite some evenings to get just a bit
into the code.

Without polluting with all the function calls, let me try to explain, how I
see the current source (peter@ please correct me if I am wrong).

a filter can in principle inject any HTML5 string into the datamodel. Core
delivers functions to manipulate the HTML5 model, but does not control what

Meaning if a filter wants to write "<p style=janPrivate,
idJan=nogo>foo</p>" to the data, it can do that. The problem with that is
that all the other filters need to understand this, when reading data and
generating their format.

My idea is that core should provide function like (just an example)
   addParagraph(*style, *id, *text)
Doing that means a filter cannot write arbitrary HTML5 but only what is
"allowed". If a filter need a new capability, core would be extended in a
controlled fashion and all filters updated.

>   This relates to how inter-conversion is to be tested.  Is there some
>   abstraction against which document features are assessed and mapped
>   through or are we working concrete level to/from concrete level and
>   that is essentially it?
I dont think we should test inter-conversion as such. It is much more
efficient to format xyz <-> HTML5. And if our usage of HTML5 is defined
(and restricted) it should work.

>   Help me calibrate my understanding of the thrust.
hope it helps a bit...if not please ask again, because this is a real
crucial point we all need to agree on.

jan i

> </orcmid>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message