hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From o.kalnichev...@dplanet.ch
Subject Re: The use of UTIUtil.toUsingCharset?
Date Thu, 20 Feb 2003 20:09:02 GMT

>That's the crux of the matter right there.  What the 
>target.getBytes(fromCharset) does is ask the original "target" Unicode 
>String (presumably containing % escapes) to convert itself to its byte 
>representation in the original charset.  Then "new String(..., 
>toCharset) creates a new Unicode string while pretending those very same 
>bytes we just created are in "toCharset", which is presumably a 
>different charset.  Any Unicode characters that have different encodings 
>in those two character sets will end up changing in the second string, 
>because the bytes will be written into the byte array using one 
>character set, and then interpreted using another character set.  And 
>since some character set encodings are stateful, it's conceivable that 
>you could even have "fromCharset" and "toCharset" values that caused the 
>new String construction to blow up because the byte array was invalid 
>for the toCharset decoder.
>The part I'm having trouble with is *why* you'd want to do this.  The 
>whole point of Unicode (or one of them) is so that you don't have to 
>remember what charset your byte arrays are in.  Once you convert from a 
>String to a byte array, you need to preserve the charset of that byte 
>array.  Suddenly pretending it's in a different charset is just going to 
>screw things up.

I really appreciate your response. I can't say I comletely agree with your point (or understand
it), but so be it. 

Had not Sung-Su refused to provide a simple unit test case for this method, this discussion
would have been put to an end a few months ago. But apparently writing test cases is for losers

Kind regards


View raw message