corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franz de Copenhague <franzdecopenha...@outlook.com>
Subject RE: html ids
Date Wed, 17 Jun 2015 21:44:59 GMT

>-----Original Message-----
>From: Peter Kelly [mailto:pmkelly@apache.org]
>Sent: Wednesday, June 17, 2015 3:54 PM
>To: dev@corinthia.incubator.apache.org
>Subject: Re: html ids
>
>> On 17 Jun 2015, at 8:09 pm, Ian C <ian@amham.net> wrote:
>>
>> Hi Peter,
>>
>> when the Word converter creates an html element via the
>> WordConverterCreateAbtract function it creates an associated id attribute.
>>
>> Having examined the resulting html I see each element does have an id.
>>
>> Are these necessary and if so when and where? I'm guessing some sort
>> of lookup function somewhere?
>
>The id attributes are used for two purposes:
>
>1. To enable elements in an updated version of the document to be
>correlated with the elements from the original version 2. As a target for cross-
>references to figures, tables and headings.
>
>The first one is the most important, since it applies to all elements, instead of
>only those that are targets of cross-references.
>
>The number included in the id attribute is the “sequence number” of the
>node in the document (the seqNo field of DFNode). During parsing, these are
>assigned sequentially, starting from 0; as a result, sequence numbers in a
>document immediately after parsing represent are in the same order as they
>appear in the originating XML file.
>
>This ordering does not really matter as such, but the consistency does - two
>parses of the same XML file are guaranteed to produce the same sequence
>numbers. The update process (HTML -> docx) relies on this guarantee, since it
>re-parses the docx file from which the HTML was generated, and assumes
>that the ids in the HTML match up with the sequence numbers obtained from
>the parse.
>
>When new nodes are added to a document after parsing, the are assigned
>new sequence numbers consecutively, starting with the first number after
>what has been assigned so far.
>
>DFDocument maintains a mapping from id attributes to Nodes. So if you have
>a node in the document.xml file, say, and you want to find the corresponding
>HTML element (if it exists), then you construct a string with the id prefix and
>the sequence number, and then do a lookup in the nodesByIdAttr hash table
>of the DFDocument object. There is a convenience function that does this,
>called DFElementForIdAttr(). This function is used in WordBookmarks and
>WordFields for dealing with cross-references.
>
>WordConverterCreateAbstract() is used for creating a HTML element in the
>‘get’ operation. It sets the id attribute based on the prefix used during
>conversion, and the sequence number of the supplied concrete element. This
>sets up the relationship, which is subsequently used in the ‘put’ operation.
>
>WordConverterGetConcrete() does the reverse. It takes as input a HTML
>element from the abstract document, and checks to see if it has an id
>attribute. If so, it extracts the sequence number from the attribute, and uses
>that to locate the concrete element (typically in document.xml) from which
>that HTML element was originally derived.
>
>Once it has determined the sequence number, WordConverterGetConcrete()
>calls DFNodeForSeqNo(), which uses a hash table maintained by the
>document to map sequence numbers to nodes. The result may be NULL,
>indicating that there is no such node in the document, though in general that’s
>unlikely.
>
>The most important use of WordConverterGetConcrete() is in
>WordContainerPut(), which is a wrapper around BDTContainerPut. The
>BDTContainerPut function is what handles the re-ordering of nodes (e.g. if a
>paragraph was moved to a different part of the HTML document, we move it’s
>counterpart in document.xml, retaining all supported and unsupported
>properties, e.g. certain formatting options that can’t be expressed in HTML).
>
>Hope this clears things up a little bit… let me know if you need me to clarify
>anything further.
>
>And yes, I believe we’ll need the same thing for ODF, in order to properly
>handle bidirectional transformation, which allows us to preserve aspects of
>the ODF document that we don’t yet (or can’t) express in HTML. Perhaps this
>can be abstracted in a generic manner so that it can be used by both filters
>(and others in the future).
>
>—
>Dr Peter M. Kelly
>pmkelly@apache.org
>
>PGP key: http://www.kellypmk.net/pgp-key
><http://www.kellypmk.net/pgp-key> (fingerprint 5435 6718 59F0 DD1F BFA0
>5E46 2523 BAA1 44AE 2966)


I think that I did comment previously, using data-* attribute for the persistency of DFNode
sequence number, instead of the HMTL id. This is limitation to the client app that cannot
manipulate the HTML id for its own purpose.

http://www.w3.org/TR/2011/WD-html5-20110525/elements.html#embedding-custom-non-visible-data-with-the-data-attributes

franz


Mime
View raw message