Return-Path: Delivered-To: apmail-jakarta-ant-user-archive@apache.org Received: (qmail 37727 invoked from network); 21 Feb 2002 16:58:27 -0000 Received: from unknown (HELO mail.covalent.net) (64.84.39.163) by daedalus.apache.org with SMTP; 21 Feb 2002 16:58:27 -0000 Received: (qmail 7178 invoked from network); 21 Feb 2002 15:58:28 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by mail.covalent.net with SMTP; 21 Feb 2002 15:58:28 -0000 Received: (qmail 3601 invoked by uid 97); 21 Feb 2002 15:58:18 -0000 Delivered-To: qmlist-jakarta-archive-ant-user@jakarta.apache.org Received: (qmail 3554 invoked by uid 97); 21 Feb 2002 15:58:17 -0000 Mailing-List: contact ant-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Ant Users List" Reply-To: "Ant Users List" Delivered-To: mailing list ant-user@jakarta.apache.org Received: (qmail 3535 invoked from network); 21 Feb 2002 15:58:16 -0000 X-Authentication-Warning: bodewig.bost.de: bodewig set sender to bodewig@apache.org using -f To: ant-user@jakarta.apache.org Subject: Re: Ant, SAX Parser and Internationalization References: <2DE597FFF8C7D511950400D0B7D4871C11F351@susdayte53.daytonoh.ncr.com> From: Stefan Bodewig Date: 21 Feb 2002 16:43:22 +0100 In-Reply-To: <2DE597FFF8C7D511950400D0B7D4871C11F351@susdayte53.daytonoh.ncr.com> Message-ID: Lines: 31 User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Civil Service) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Thu, 21 Feb 2002, Paul Smiley wrote: > "...really use UTF-8" - am I not using UTF-8 when using > 'encoding="UTF-8"'? No, you only claim you'd be using UTF-8. � is the ISO-8859-1 encoded version of the Unicode character with the number 230. The UTF-8 encoded version consists of the two bytes æ. > Is there some type of byte mark as there is with UTF-16? UTF-8 uses between one and three bytes to encode characters - only the first 127 characters use a one byte encoding. I'm sure you'll find more then enough resources that will give you the full details on the web. You could write your XML file using Java and set the encoding of your OutputStreamWriter to UTF8 to see what it will look like. > Also, I need to support Kanji and Chinese characters, so I believe > that UTF-8 and ISO-8859-1 are inadequate. UTF-8 is probably fine, ISO-8859-1 is completely inadequate. UTF-8 is one encoding for the complete sixteen bit Unicode set, as is UTF-16. ISO-8859-1 is a completely different character set that happens to be identical with the first 256 characters of Unicode, and it is the character set used by default on most operating systems in the US and western Europe. Stefan -- To unsubscribe, e-mail: For additional commands, e-mail: