Return-Path: Delivered-To: apmail-tomcat-users-archive@www.apache.org Received: (qmail 59871 invoked from network); 2 Jan 2009 01:33:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Jan 2009 01:33:43 -0000 Received: (qmail 50153 invoked by uid 500); 2 Jan 2009 01:33:30 -0000 Delivered-To: apmail-tomcat-users-archive@tomcat.apache.org Received: (qmail 50123 invoked by uid 500); 2 Jan 2009 01:33:30 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 50112 invoked by uid 99); 2 Jan 2009 01:33:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Jan 2009 17:33:30 -0800 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.61.61.104] (HELO usea-naimss2.unisys.com) (192.61.61.104) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Jan 2009 01:33:21 +0000 Received: from usea-nagw2.na.uis.unisys.com ([129.224.72.18]) by usea-naimss2 with InterScan Message Security Suite; Thu, 01 Jan 2009 19:32:59 -0600 Received: from usea-nagw2.na.uis.unisys.com ([129.224.72.53]) by usea-nagw2.na.uis.unisys.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 1 Jan 2009 19:32:59 -0600 Received: from usea-nahubcas1.na.uis.unisys.com ([129.224.76.114]) by usea-nagw2.na.uis.unisys.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 1 Jan 2009 19:32:58 -0600 Received: from USEA-EXCH7.na.uis.unisys.com ([129.224.76.38]) by usea-nahubcas1.na.uis.unisys.com ([129.224.76.114]) with mapi; Thu, 1 Jan 2009 19:32:58 -0600 From: "Caldarale, Charles R" To: Tomcat Users List Date: Thu, 1 Jan 2009 19:32:56 -0600 Subject: RE: [OT] Basic int/char conversion question Thread-Topic: [OT] Basic int/char conversion question Thread-Index: AclsSW5Ueo3OvAEGRPSp3ZAT68oBdQALjhHQ Message-ID: <0AAE5AB84B013E45A7B61CB66943C17215A7FEBA4A@USEA-EXCH7.na.uis.unisys.com> References: <495CEBBF.7060107@ice-sa.com> <497fac690901010903u3f51470ex6f4f93563a8b88f6@mail.gmail.com> <0AAE5AB84B013E45A7B61CB66943C17215A7FEBA06@USEA-EXCH7.na.uis.unisys.com> <497fac690901010928u60f263eckfc850e648ae91d13@mail.gmail.com> <0AAE5AB84B013E45A7B61CB66943C17215A7FEBA12@USEA-EXCH7.na.uis.unisys.com> <495D1C0B.4080400@ice-sa.com> In-Reply-To: <495D1C0B.4080400@ice-sa.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginalArrivalTime: 02 Jan 2009 01:32:58.0952 (UTC) FILETIME=[0B618480:01C96C7A] X-Virus-Checked: Checked by ClamAV on apache.org > From: Andr=E9 Warnier [mailto:aw@ice-sa.com] > Subject: Re: [OT] Basic int/char conversion question > > Suppose I do this : > > String knownEncoding =3D "ISO-8859-1"; // or "ISO-8859-2" > InputStreamReader fromApp; > fromApp =3D =3D new InputStreamReader(socket.getInputStream(), > Charset.forName(knownEncoding)); > int ic =3D 0; > StringBuffer buf =3D new StringBuffer(2000); > while((ic =3D fromApp.read()) !=3D 26 && ic !=3D -1) // hex 1A (SUB) > buf.append((char)ic); > > .. then I'm still appending the same char (really, byte) to my > buffer, right ? No, it's not the same. It's the proper Unicode equivalent of the input byt= e (or bytes, for multi-byte character sets), not the original 8-bit value. = You're responsible for setting the appropriate character set on InputStrea= mReader constructor to insure that conversion takes place. > But by doing > buf.append((char) ic) > I am still interpreting ic as being, by platform default, ISO-8859-1, > thus I am still appending the Unicode codepoint U00B5. That's not correct. The interpretation occurs on the read() operation on t= he InputStreamReader, not the cast to a char. The read() already converted= the byte according to the specified Charset; if your input is 8859-2, you = must use that on the InputStreamReader constructor. > Or, can I / do I have to now also say : > char ic =3D 0; > while((ic =3D fromApp.read()) !=3D 26 && ic !=3D -1) // hex 1A (SUB) > buf.append(ic); That can't ever work, since a char is unsigned, so can never have a value o= f -1; you will get a compilation error since the result of the read() is an= int, not a char. > In other words, in order to keep my changes and post-festivities > headaches to a minimum, I would like to keep buf being a StringBuffer. Which is exactly why you should use an InputStreamReader, not an InputStrea= m, and not change anything else. > So what I was really looking for was the correct alternative to > buf.append((char) ic); You're looking in the wrong place; the conversion should occur as the input= is being read, not during the append(). > A cursory examination of the webapp code seems to show that > the byte in question is only ever compared to either -1 or > integers below 127, or characters in the lower ASCII range > "A-Za-z". Excellent; then wrappering the InputStream with an InputStreamReader set to= the appropriate character set is *exactly* what you need. > But is > if (char =3D=3D some-integer) > always valid as a replacement for > if (int =3D=3D some-integer) No; a char is unsigned, which is why all read() methods return an int, not = a byte or a char. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MA= TERIAL and is thus for use only by the intended recipient. If you received = this in error, please contact the sender and delete the e-mail and its atta= chments from all computers. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org