hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric E Johnson <e...@tibco.com>
Subject Re: The use of UTIUtil.toUsingCharset?
Date Thu, 20 Feb 2003 22:11:53 GMT
Laura Werner wrote:

> How about if we just deprecate the @#% thing and the two URIUtil 
> methods that call it? 

I found four uses (not just two), and HttpClient itself does not use any 
of those four methods.  My vote is to deprecate them away.

As for why these functions exist, I keep thinking along these lines - 
imagine you want to encode foreign language characters in a URL.  The 
way to do it is to convert your string into bytes, and then URL encode 
the bytes as if it were ASCII.  Reversing the process, take your URL, 
decode it into ASCII, treat each character as a byte, and then convert 
those bytes back via the expected encoding.  So you can imagine that the 
first step would be precisely what these routines do - a conversion of a 
String into byte encoding XXX, and then back into a String in encoding 
YYY, where YYY almost certainly is ASCII.  Having done that, you can use 
all your functions that URL encode a String instead of writing an 
additional function that takes bytes.  Unfortunately, if the encoding 
YYY has any characters outside the 0-255 range, you'd be hosed, and the 
documentation doesn't say that.

I think the W3C's official word on this is to use UTF-8 for the XXX 
encoding, but I don't know the link off the top of my head.

Maybe I'm getting the application wrong, though.

-eej.

P.S. I changed my name on send line, so as to avoid being confused with 
the newcomer also known as Eric Johnson.  Just my luck.  I bet some of 
us share the same birthday too.  If only I contributed enough to be be 
blessed with a Middle-Earth name, then I wouldn't have to worry about 
ambiguity!


Mime
View raw message