cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.Pietschmann" <j3322...@yahoo.de>
Subject Re: Random comments and bugfixes
Date Tue, 11 Nov 2003 18:04:46 GMT
Stefano Mazzocchi wrote:
> ... 0xd800 is not a legal XML character.
...
>    <high-unicode ...>&#65536;</high-unicode>
> 
> Now: whose problem is this Slide's or JDOM's?

JDOM, I'd guess without looking at the code. This is a very general
problem: The surrogate Unicode codepoints are illegal for itself in
XML, but of course in Java strings there is no way to express
non-baseplane characters other than as two surrogates. Problem:
if the test for illegal surrogates is before character reference
expansion, illegal surrogates may sneak in as char refs. If the test is
after character reference expansion, a non-baseplane character may
trigger a false positive. Obviously, the test has to be done twice,
once for literal characters and once as part of dealing with character
references.

I personally wouldn't loose much sleep over this particular problem.
Unless you are into MathML or obscure historic scripts, non-baseplane
characters are more of a curiosum.

J.Pietschmann


Mime
View raw message