corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kelly <>
Subject Re: ODF branch: The confused edition.
Date Sat, 23 May 2015 18:00:02 GMT
> On 23 May 2015, at 6:36 am, Gabriela Gibson <> wrote:
> Hi,
> Well, I managed to get (rudimentary) headers, tables, lists working, but
> the bold, italic and underlined nodes have me confused, mostly because
> nothing appears in the order I expect it to.  I used the
> file sample/documents/odf/bold-italic-underlined.odt for that part.

I haven’t yet gone and run the code, but looking at the approaches used in traverseContent()
where it calls find_HTML() to determine the corresponding HTML tag for a given ODF node, I
don’t think this is going to work for a lot of the constructs in ODF. The reason is that
it’s not always a simple mapping - for some basic constructs like paragraphs and (some)
tables it will work, but there will be other cases where more complex processing is needed.
So I think, at least for the time being (the very interesting DSL ideas we’ve been discussing
notwithstanding), a first cut that has one big switch statement for all the supported node
types is more likely to be successful. This way, you can do any arbitrary processing you need
for a given node type, and are not restricted to simply mapping it to a particular HTML element.

In terms of the formatting notes like those for italic and bold, I would suggest instead building
up a set of CSS properties rather than creating HTML tags for <b>, <i>, and <u>.
The reason for this is that there are only a few such tags in HTML, but there are many other
formatting properties that can’t be expressed in this manner and instead use CSS. An <span>
element with style=“font-weight: bold” attribute is equivalent to <b>, and there’s
some code somewhere in the html directory which from memory I think converts between the two.
So creating a CSSProperties object and setting the relevant name/value pairs in that will
enable you to serialise the result and place that in a span tag.

The other reason the CSS approach is more appropriate is that it can also be used for stylesheets.
For automatic styles in ODF, we want to translate those to style=“…” attributes in HTML
(that is, direct formatting, which is essentially what automatic styles are). However for
normal styles, we want an entry in the CSS stylesheet, and then reference that from the HTML
element via the class=“…” attribute.

Have a look in the ooxml/src/word/formatting directly for how this is handled in the Word
filter. This takes an XML node from the Word document as input, and populates a CSSProperty
object with the appropriate values. There are also functions to go the other way, when performing
an update. I would recommend an approach similar to this.

Coming back to HTML_B and friends: I just had a look at HTMLNormalization.c and it looks like
it only does this in the inverse situation to what I described above. That is, when reading
a HTML file and preparing it for conversion into a Word document, it converts <b>, <i>,
<u> etc into <span> tags with the appropriate CSS properties set in the style
attribute. It doesn’t go the other way, though that could potentially be done. Both approaches
are essentially identical anyway in terms of how they will render in a browser and be treated
by the editor.

> I also had to do some surgery on DocFormats/core/src/xml/DFNameMap.* so I
> could access DFNameMap.

It isn’t actually necessary to put this stuff in the header - it’s best to keep the struct
definition in the C file and only ever access it through the functions exposed in the header.
If you’re not accessing any of the fields of DFNameMap (which you’re not, at least in
the code currently in the repository), then the compiler simply needs to know that there exists
a struct type called DFNameMap, without knowing what it’s fields actually are. The following
line in DFNameMap.h declares the typedef:

typedef struct DFNameMap DFNameMap;

Everything you need to do with the name map can be achieved with the public functions - and
in the event you find something that can’t be done, it’s better to either add a new function.
Though this shouldn’t be necessary; if you find such situations let me know and I’ll explain
how to do it with the existing functions :)

Dr Peter M. Kelly

PGP key: <>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message