corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kelly <pmke...@apache.org>
Subject Re: html ids
Date Fri, 19 Jun 2015 16:20:15 GMT
> On 18 Jun 2015, at 4:44 am, Franz de Copenhague <franzdecopenhague@outlook.com>
wrote:
> 
> I think that I did comment previously, using data-* attribute for the persistency of
DFNode sequence number, instead of the HMTL id. This is limitation to the client app that
cannot manipulate the HTML id for its own purpose.
> 
> http://www.w3.org/TR/2011/WD-html5-20110525/elements.html#embedding-custom-non-visible-data-with-the-data-attributes
> 

I think in principle either way would wok fine. The id attribute is supposed to be unique
across all elements in a document, and I would expect most programs that manipulate the HTML
to keep the id attributes as-is (that’s just an educated guess, it’s not guaranteed).

We also use the ids for cross-references (e.g. if you have a labeled figure, and a hyperlink
saying “See Figure 1”), so at minimum we need to keep them for those purposes (though
that could be considered a separate requirement than that of identifiers for bi-directional
transformation).

Ultimately what I’d like to achieve though is to avoid the need for the id attributes for
BDT purposes entirely, because there will probably be some use cases where it’s not possible
to maintain them. For example, someone converts a Word document to Markdown, because that’s
what they prefer to use. After modifying the file, they update the Word document and send
it back to their “unenlightened” colleague. Markdown doesn’t support id attributes so
we can’t rely on those to work out which parts of the Markdown file (reconstituted internally
as a HTML file prior to the update actually taking place).

The strategy I think we could use here is to essentially do a diff - but it has to be more
intelligent than a simple line-based diff, because of the tree structure. I experimented with
this a while back using the Myers diff algorithm (which assumes a sequence of items, not a
tree). However my attempts to modify it to deal with trees were unsuccessful. There’s been
some other research done on tree diff algorithms that I haven’t had a chance to look into
yet, but I’m hopeful we may be able to find or develop a suitable algorithm, at least for
the case of languages like Markdown as in the use case above.

[1] http://www.xmailserver.org/diff2.pdf

—
Dr Peter M. Kelly
pmkelly@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message