tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier (tomcat) ...@ice-sa.com>
Subject Re: Having Java websocket server in tomcat handle ISO8859_1
Date Mon, 08 Feb 2016 23:25:56 GMT
On 08.02.2016 23:31, Christopher Schultz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> All,
>
> On 2/8/16 3:43 PM, Mark Thomas wrote:
>> On 08/02/2016 18:41, Jason Ricles wrote:
>>> I have an application that sends binary websocket messages
>>> between a class and the web application using a websocket server
>>> written in java.
>>>
>>> The data being sent from the java class is encoded in a binary
>>> buffer with the bytes in ISO8859_1. However, when I receive the
>>> bytes on the websocket server and the web application end they
>>> are junk (such as -121, -116, etc.) and not encoded the correct
>>> way that they need to be.
>>
>> The bytes are transmitted as unsigned on the wire (as required by
>> the WebSocket spec). Java handles them as signed. You need to
>> convert them. Something like (untested):
>>
>> char c = b & 0xFF;
>
> I had to read this something like 10 times before I convinced myself
> that this was correct. For those who want to know what this makes any
> kind of sense (because, at first glance, it does not make any sense),
> I'll explain it.
>
> For starters, Java uses signed byte primitives but /unsigned/ char
> primitives. For those coming from the C world, that may be confusing.
> bytes are 8 (signed) bits and chars are 16 (unsigned) bits.
>
> But Java doesn't have any defined arithmetic operations (including
> bitwise) for anything smaller than an int (32 signed bytes), so the
> above assignment is actually more like this:
>
> byte b = 0xab; // e.g.
> char c = (char)  (     ((int)b) & 0xff     )
>
> So, first b is widened from 8 bits to 32 bits -- with a
> sign-extension. That means that -1 is still -1, it's just represented
> by a different bit pattern: 1111 1111 1111 1111 1111 1111 1111 1111
> instead of 1111 1111.
>
> Next, the bitwise && is performed, which zeros-out everything but the
> bottom 8-bits (now we have .... .... 0000 0000 1111 1111). Then, that
> value is cast to char which does practically nothing.
>
> In the above example (-1), we get a final value of 255 for c, which is
> exactly what you'd expect for an unsigned char whose signed value is -1.
>
> I think the only surprise thing there is that Java widens all types to
> 32-bit signed int to perform these operations. Without that fact, the
> above assignment doesn't make much sense. In C, that line of code
> would do absolutely nothing at all.
>

Would a simpler way to say this not be that in Java, a char is a 16-bit integer whose 
value happens to be the corresponding character's Unicode codepoint ?

Of course his all takes us further away from the OP's original description of the issue, 
which said "The data being sent from the java class is encoded in a binary
buffer with the bytes in ISO8859_1."
Which basically doesn't make sense, unless the data in question is originallly text.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message