cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Piroumian" <kpiroum...@apache.org>
Subject Re: [d'oh!] java APIs are not powerful enough to handle the XML spec!!
Date Fri, 14 Nov 2003 08:32:06 GMT
According to Unicode Design principles in Unicode 3.0 specification:
<quot>Unicode characters have a width of 16 bits.</quot>

While in Unicode 4.0 standart there are no character width related
principles.

And according to JavaDocs of Character class (J2SE 1.4):
<quot>Character information is based on the Unicode Standard, version 3.0.
</quot>
And one more from the Java Language Specification:
<quot>
Versions of the Java programming language prior to 1.1 used Unicode version
1.1.5 (see The Unicode Standard: Worldwide Character Encoding (1.4) and
updates). Later versions prior to JDK version 1.1.7 used Unicode version
2.0. Since JDK version 1.1.7, Unicode 2.1 has been in use. The Java platform
will track the Unicode specification as it evolves. The precise version of
Unicode used by a given release is specified in the documentation of the
class Character.
</quot>

So, it seems that the only thing (optimistic mode is on) that should be
changed in further versions of Java to support Unicode 4.0 is to modify the
Character class.

Regards,
  Konstantin Piroumian

----- Original Message ----- 
From: "Stefano Mazzocchi" <stefano@apache.org>
To: "Apache Cocoon" <dev@cocoon.apache.org>
Sent: Thursday, November 13, 2003 21:06
Subject: [d'oh!] java APIs are not powerful enough to handle the XML spec!!


The day somebody asks you why java needs to be replaced, one answer
will be 'it only supports 16-bits chars'. laughable as it might seem,
it's true.

yes, people, a Unicode char is not 16 bit (as I always though!) but 32!!

And even the XML specification says so.

Char ::=══#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

do the math and you find that #x10000 cannot fit in 16 bits!

now, if you thought you could take the character() SAX event and create
a String out of it and do something useful with is (like print it, for
example), forget it. The result will very likely not be the one you
expect.

Another reason not to use Stings at all.

--
Stefano.


Mime
View raw message