commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörg Schaible <>
Subject Re: [LANG] Wanted - spec lawyer.
Date Tue, 30 Jun 2009 07:57:01 GMT
Hi Hen,

Henri Yandell wrote at Dienstag, 30. Juni 2009 09:15:

> Now that the StringEscape system has a foundation to support
> whatever's needed (one hopes) the next step is to define exactly what
> escaping XML should do. As Jörg notes in LANG-66, XML is different for
> XML 1.0 and 1.1. Great, let's support both then. StringEscapeUtils can
> support the old method (for now) with whatever legacy we have to put
> in there, but EscapeUtils and UnescapeUtils can be 'correct'.
> A core question is what to do about > 0x7f unicode characters.
> Escaping them seems bad, yet we did it a lot. In escapeJava, in
> escapeXml, in escapeHtml.

As pointed out and define the valid
characters for XML 1.0 and 1.1.

However, the escape functionality is actually different. If you transport
XML (or HTML) in a UTF-8 encoded text file or one encoded by ASCII-7 is a
big difference. In the former you don't have to encode anything, while you
have to encode anything above 0x7f in the latter case. And this applies to
XML, HTML or Java source files at equal level.

The character set definition of the two XML versions is a vertical condition
set. An attempt to encode a character outside the XML definition is
actually a situation that cannot be handled and should raise an exception
(like every XML parser will do anyway).

Therefore the question is, whether (Un)EscapeUtils should actually be an
instance initialized with the target character encoding. And that raises
the question how close we're actually at reimplementing

- Jörg

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message