Mailing-List: contact dev-help@corinthia.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@corinthia.incubator.apache.org
Message-ID: <COL401-EAS7566CF13BF4558865547C6B5A60@phx.gbl>
From: Franz de Copenhague <franzdecopenhague@outlook.com>
To: <dev@corinthia.incubator.apache.org>
References: 
 <CAKw-CVL1O8wB3qxFD4utdvNP-62HNUtq=mESD8seLZeNq0VyAA@mail.gmail.com>
 <EC176CBC-4694-4DAB-B12B-736705DF9DC8@apache.org>
In-Reply-To: <EC176CBC-4694-4DAB-B12B-736705DF9DC8@apache.org>
Subject: RE: html ids
Date: Wed, 17 Jun 2015 17:44:59 -0400
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AQABAgMEJwX7LJ3aJtn0ryh5HPEsAABfPaAFoU2NitA=
Content-Language: en-us


>-----Original Message-----
>From: Peter Kelly [mailto:pmkelly@apache.org]
>Sent: Wednesday, June 17, 2015 3:54 PM
>To: dev@corinthia.incubator.apache.org
>Subject: Re: html ids
>
>> On 17 Jun 2015, at 8:09 pm, Ian C <ian@amham.net> wrote:
>>
>> Hi Peter,
>>
>> when the Word converter creates an html element via the
>> WordConverterCreateAbtract function it creates an associated id =
attribute.
>>
>> Having examined the resulting html I see each element does have an =
id.
>>
>> Are these necessary and if so when and where? I'm guessing some sort
>> of lookup function somewhere?
>
>The id attributes are used for two purposes:
>
>1. To enable elements in an updated version of the document to be
>correlated with the elements from the original version 2. As a target =
for cross-
>references to figures, tables and headings.
>
>The first one is the most important, since it applies to all elements, =
instead of
>only those that are targets of cross-references.
>
>The number included in the id attribute is the =E2=80=9Csequence =
number=E2=80=9D of the
>node in the document (the seqNo field of DFNode). During parsing, these =
are
>assigned sequentially, starting from 0; as a result, sequence numbers =
in a
>document immediately after parsing represent are in the same order as =
they
>appear in the originating XML file.
>
>This ordering does not really matter as such, but the consistency does =
- two
>parses of the same XML file are guaranteed to produce the same sequence
>numbers. The update process (HTML -> docx) relies on this guarantee, =
since it
>re-parses the docx file from which the HTML was generated, and assumes
>that the ids in the HTML match up with the sequence numbers obtained =
from
>the parse.
>
>When new nodes are added to a document after parsing, the are assigned
>new sequence numbers consecutively, starting with the first number =
after
>what has been assigned so far.
>
>DFDocument maintains a mapping from id attributes to Nodes. So if you =
have
>a node in the document.xml file, say, and you want to find the =
corresponding
>HTML element (if it exists), then you construct a string with the id =
prefix and
>the sequence number, and then do a lookup in the nodesByIdAttr hash =
table
>of the DFDocument object. There is a convenience function that does =
this,
>called DFElementForIdAttr(). This function is used in WordBookmarks and
>WordFields for dealing with cross-references.
>
>WordConverterCreateAbstract() is used for creating a HTML element in =
the
>=E2=80=98get=E2=80=99 operation. It sets the id attribute based on the =
prefix used during
>conversion, and the sequence number of the supplied concrete element. =
This
>sets up the relationship, which is subsequently used in the =
=E2=80=98put=E2=80=99 operation.
>
>WordConverterGetConcrete() does the reverse. It takes as input a HTML
>element from the abstract document, and checks to see if it has an id
>attribute. If so, it extracts the sequence number from the attribute, =
and uses
>that to locate the concrete element (typically in document.xml) from =
which
>that HTML element was originally derived.
>
>Once it has determined the sequence number, WordConverterGetConcrete()
>calls DFNodeForSeqNo(), which uses a hash table maintained by the
>document to map sequence numbers to nodes. The result may be NULL,
>indicating that there is no such node in the document, though in =
general that=E2=80=99s
>unlikely.
>
>The most important use of WordConverterGetConcrete() is in
>WordContainerPut(), which is a wrapper around BDTContainerPut. The
>BDTContainerPut function is what handles the re-ordering of nodes (e.g. =
if a
>paragraph was moved to a different part of the HTML document, we move =
it=E2=80=99s
>counterpart in document.xml, retaining all supported and unsupported
>properties, e.g. certain formatting options that can=E2=80=99t be =
expressed in HTML).
>
>Hope this clears things up a little bit=E2=80=A6 let me know if you =
need me to clarify
>anything further.
>
>And yes, I believe we=E2=80=99ll need the same thing for ODF, in order =
to properly
>handle bidirectional transformation, which allows us to preserve =
aspects of
>the ODF document that we don=E2=80=99t yet (or can=E2=80=99t) express =
in HTML. Perhaps this
>can be abstracted in a generic manner so that it can be used by both =
filters
>(and others in the future).
>
>=E2=80=94
>Dr Peter M. Kelly
>pmkelly@apache.org
>
>PGP key: http://www.kellypmk.net/pgp-key
><http://www.kellypmk.net/pgp-key> (fingerprint 5435 6718 59F0 DD1F BFA0
>5E46 2523 BAA1 44AE 2966)


I think that I did comment previously, using data-* attribute for the =
persistency of DFNode sequence number, instead of the HMTL id. This is =
limitation to the client app that cannot manipulate the HTML id for its =
own purpose.

http://www.w3.org/TR/2011/WD-html5-20110525/elements.html#embedding-custo=
m-non-visible-data-with-the-data-attributes

franz