tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Caldarale, Charles R" <Chuck.Caldar...@unisys.com>
Subject RE: [OT] Basic int/char conversion question
Date Fri, 02 Jan 2009 01:32:56 GMT
> From: André Warnier [mailto:aw@ice-sa.com]
> Subject: Re: [OT] Basic int/char conversion question
>
> Suppose I do this :
>
> String knownEncoding = "ISO-8859-1"; // or "ISO-8859-2"
> InputStreamReader fromApp;
> fromApp =  = new InputStreamReader(socket.getInputStream(),
> Charset.forName(knownEncoding));
> int ic = 0;
> StringBuffer buf = new StringBuffer(2000);
> while((ic = fromApp.read()) != 26 && ic != -1) // hex 1A (SUB)
>            buf.append((char)ic);
>
> .. then I'm still appending the same char (really, byte) to my
> buffer, right ?

No, it's not the same.  It's the proper Unicode equivalent of the input byte (or bytes, for
multi-byte character sets), not the original 8-bit value.  You're responsible for setting
the appropriate character set on InputStreamReader constructor to insure that conversion takes
place.

> But by doing
>         buf.append((char) ic)
> I am still interpreting ic as being, by platform default, ISO-8859-1,
> thus I am still appending the Unicode codepoint U00B5.

That's not correct.  The interpretation occurs on the read() operation on the InputStreamReader,
not the cast to a char.  The read() already converted the byte according to the specified
Charset; if your input is 8859-2, you must use that on the InputStreamReader constructor.

> Or, can I / do I have to now also say :
> char ic = 0;
> while((ic = fromApp.read()) != 26 && ic != -1) // hex 1A (SUB)
>            buf.append(ic);

That can't ever work, since a char is unsigned, so can never have a value of -1; you will
get a compilation error since the result of the read() is an int, not a char.

> In other words, in order to keep my changes and post-festivities
> headaches to a minimum, I would like to keep buf being a StringBuffer.

Which is exactly why you should use an InputStreamReader, not an InputStream, and not change
anything else.

> So what I was really looking for was the correct alternative to
>            buf.append((char) ic);

You're looking in the wrong place; the conversion should occur as the input is being read,
not during the append().

> A cursory examination of the webapp code seems to show that
> the byte in question is only ever compared to either -1 or
> integers below 127, or characters in the lower ASCII range
> "A-Za-z".

Excellent; then wrappering the InputStream with an InputStreamReader set to the appropriate
character set is *exactly* what you need.

> But is
> if (char == some-integer)
> always valid as a replacement for
> if (int == some-integer)

No; a char is unsigned, which is why all read() methods return an int, not a byte or a char.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus
for use only by the intended recipient. If you received this in error, please contact the
sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message