ant-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominique Devienne <ddevie...@gmail.com>
Subject Re: bug in DOMElementWriter
Date Sun, 27 Jun 2010 17:35:42 GMT
On Sat, Jun 26, 2010 at 6:54 PM, Jon Stevens <latchkey@gmail.com> wrote:
> For example, attr="&amp;" comes out as attr="&amp;" and not attr="&"... I
> don't have to write attr="&amp;amp;" to get what I want. The same is true
> with attr="&gt;"... it comes out as attr="&gt;" instead of attr=">". This
is
> all because DOMElementWriter.encode() is smart about those entities.
>
> attr="&#10;" should come out as attr="&#10;", not attr="\n"

Well, I'm afraid Antoine is right, and the comparison you make is not "fair".

&, <, and > are "special" in XML, and must always be encoded in
attribute values and textual content. \n is not.

<echoxml> never sees the "&amp;" text, it sees whatever the XML parser
reports, a "&", and the XML serializer Ant uses knows it must encode
that char into "&amp;", thus it ends up back the way it was. But with
\n, which is just like any other character*, the serializer doesn't do
anything special, and the output the also contain a "plain" \n.

The is XML, and Ant can do nothing about it. The textual
representation of the "XML infoset" doesn't matter, what matters is
the info, and the XML parser doesn't always report the info as it was
in the text of the XML but as it's equivalent is. Most parsers offer
configurations that control how it reports stuff, but you can never
get a fully exact representation of the XML text, without digging into
the parser itself. --DD

* Well it's whitespace, so it could be "normalized" too.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Mime
View raw message