ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <bode...@apache.org>
Subject Re: Ant, SAX Parser and Internationalization
Date Thu, 21 Feb 2002 15:43:22 GMT
On Thu, 21 Feb 2002, Paul Smiley <ps180001@exchange.DAYTONOH.NCR.com>
wrote:

> "...really use UTF-8" - am I not using UTF-8 when using
> 'encoding="UTF-8"'?

No, you only claim you'd be using UTF-8.

æ is the ISO-8859-1 encoded version of the Unicode character with the
number 230.  The UTF-8 encoded version consists of the two bytes æ.

> Is there some type of byte mark as there is with UTF-16?

UTF-8 uses between one and three bytes to encode characters - only the
first 127 characters use a one byte encoding.  I'm sure you'll find
more then enough resources that will give you the full details on the
web.  You could write your XML file using Java and set the encoding of
your OutputStreamWriter to UTF8 to see what it will look like.

> Also, I need to support Kanji and Chinese characters, so I believe
> that UTF-8 and ISO-8859-1 are inadequate.

UTF-8 is probably fine, ISO-8859-1 is completely inadequate.

UTF-8 is one encoding for the complete sixteen bit Unicode set, as is
UTF-16.  ISO-8859-1 is a completely different character set that
happens to be identical with the first 256 characters of Unicode, and
it is the character set used by default on most operating systems in
the US and western Europe.

Stefan

--
To unsubscribe, e-mail:   <mailto:ant-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:ant-user-help@jakarta.apache.org>


Mime
View raw message