tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: [OT] getRequestURI() in relation to Connector.URIEncoding
Date Sun, 17 Feb 2013 18:54:07 GMT
Mike Wilson wrote:
...
> 
> Example 2: path /ä in "binary" Unicode
>   GET /.. [0xC3,0xA4]
> 

To nitpick : this is not "binary Unicode". It is simply non-URL-encoded, raw UTF-8, which

is itself an encoding of Unicode.

The Unicode "codepoint" of "ä" is 0xE4 (decimal 228), usually represented as U+00E4.
That would be the "binary Unicode" value of this character (although one could argue that

"11100100" would be more proper for binary).
It represents the position of this character in the overall Unicode characters table.

This is encoded as the 2 bytes [0xC3,0xA4] (decimal [195,164]) in the UTF-8 encoding.

Confusion in terminology leads to "mojibake", which in German can be translated as 
"Buchstabensalat" (see http://en.wikipedia.org/wiki/Mojibake).


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message