tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: [OT] Basic int/char conversion question
Date Tue, 13 Jan 2009 12:09:40 GMT

Christopher Schultz wrote:
> André,
> André Warnier wrote:
>> an existing webapp reads from a socket connected to an external program.
>> The input stream is created as follows :
>> fromApp = socket.getInputStream();
>> The read is as follows :
>> StringBuffer buf = new StringBuffer(2000);
>> int ic;
>> while((ic = != 26 && ic != -1) // hex 1A (SUB)
>>            buf.append((char)ic);
>> This is wrong, because it assumes that the input stream is always in an
>> 8-bit default platform encoding, which it isn't.
> Does it?
> The only assumption I see here is that the byte code 0x1a has a special
> meaning. Since ASCII is usually the lowest common denominator for
> character encodings, is this a bad assumption?

Considering the often devious ways in which character encoding questions 
can come back to bite one, I am not so sure.
By doing a read(), the app currently "consumes" one byte, whether it 
matches 0x1A or not. If the input stream was UTF-8 for instance, that 
byte might be the 2d, or 3rd byte of a multi-byte "UTF-8 character" 
sequence, which might happen to have the integer value 0x1A, although 
it's meaning would be totally different.
(I have not re-checked the UTF-8 encoding to verify if that is a 
possible value for a 2d or 3rd byte, but I think it is).

>> How do I do this correctly, assuming that I do know that the incoming
>> stream is an 8-bit stream (like iso-8859-x), and I do know which 8-bit
>> encoding is being used (such as iso-8859-1 or iso-8859-2) ?
>> I cannot change the InputStream into something else, because there are a
>> zillion other places where this webapp tests on the read byte's value,
>> numerically.

and there are other places where the "byte" is being tested against 
other values than 0x1A.

> I like Chuck's suggestion to use an InputStreamReader because the
> interfaces are (at least accidentally) the same, at least for the method
> in question. 

Me too. It is the most logical, and the one which I would apply if I 
were to rewrite this app from scratch.  I would also have the other app 
(the one which sends this stream to the webapp) send some kind of prefix 
to the stream, indicating the encoding used. (Or at least have both that 
app and the webapp have some external parameter telling them 
respectively what to send and what to expect).

I'm not sure how you would modify an entire application to
> "fix" this code everywhere, though.

Right. I was trying to find a magic shortcut. At first I was hoping that 
I could just do some kind of "string replace patch" with Notepad, 
directly on the compiled classes.  Unfortunately, considering these byte 
tests in several places, I can't.

Thanks again for all the suggestions though.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message