hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sung-Gu" <jeri...@apache.org>
Subject Re: [VOTE] Re: 2.0 release - deprecate some methods?
Date Thu, 26 Jun 2003 13:39:23 GMT

----- Original Message ----- 
From: "Adrian Sutton" <adrian@intencha.com>

> If you don't know why the code would be useful or what it was
> implemented based upon, why is it that you still want it in HttpClient?
>   There is nothing that uses those methods anywhere in HttpClient  and
> the presence of an FTP RFC that requires them still wouldn't make them
> applicable to HttpClient since we aren't dealing with FTP.

It's not confined to only FTP.   It's for every internet 'application layer'

> String temporary = URIUtil.toUsingCharset(input, "UTF-8", "Big5");
> String result = URIUTIL.toUsingCharset(temporary, "Big5", "UTF-8");
> assertEquals(input, result);
>   * \u4E01 is a Chinese character.  You can substitute \uCBBF for a wide
> range of Chinese characters and the test will still fail.
>   * Big5 is a very commonly used charset for Chinese characters.

The first step in the process can be performed by maintaining a mapping
table that includes the local character set code and the corresponding UCS
The next step is to convert the UCS character code to the UTF-8 encoding.

Hmmm.... I don't know about Big5 though...
As I guess, Big5 is not an UCS.   It should be unicode for second step.
If you want to find an UCS for Big5 automatically, you should insert some
code the toUsingCharset method perhaps.
Some might wor without UCS transformation though, it must be required I

> If you read the JavaDoc for the String constructor being used
> (String(byte[], String)), it says:
> "Constructs a new String by decoding the specified array of bytes using
> the specified charset."
> Note the use of the word "decoding" which means that instead of
> creating a String backed by the given byte array, it uses the specified
> charset to convert the bytes into actual characters - conceptually
> these characters have no particular encoding since they are
> (conceptually) the actual characters rather than a byte representation
> of the characters.  In reality, the characters are represented in
> memory by a series of bytes in UTF-8 encoding as required by the JVM
> specification.

UTF-8 is tranformation charset, not really display charset.
It's not always used as String class in java I guess.

> Secondly, the toUsingCharset method cannot work in most situations
> because it converts the string to bytes using one encoding and then
> converts those bytes to a String using a different encoding.  To
> highlight why this cannot work, create a text file and save it to disk
> using ASCII encoding.  Then, attempt to read the file back in as EBDIC
> encoding (or any double-byte character charset like UTF-16), the text

EBDIC is also not UCS.

> will have become corrupted because the bytes were mapped to characters
> using the wrong charset (a charset is simply a mapping between bytes
> and characters).
> So, the possible ways for toUsingCharset to fulfill it's contract is
> for it to be changed to:
> public String toUsingCharset(String target, String fromCharset, String
> toCharset) {
> return target;
> }
> OR to:
> public byte[] toUsingCharset(String target, String toCharset) {
> return target.getBytes(toCharset);
> }
> OR to:
> public byte[] toUsingCharset(byte[] target, String fromCharset, String
> toCharset) {
> return new String(target, fromCharset).getBytes(toCharset);
> }
> The last one is the only one that makes any sense at all, but I fail to
> see how it is useful in HttpClient.

Well... it should be byte transformation.
Like from srouce charset to the target charset.

Your first two examples look like just one way ticket to me.
Probably it might work?
Or the last one is similar though... I'm not sure...

> So Sung-Gu, please provide some justification for your -1 in terms of
> why the methods should remain in HttpClient - in particular where in
> HttpClient the method would be used and for what purpose.

As I mentioned prevously...  for example, a new method called perhaps
'toAnotherDisplay' using the toUsingCharset method were used to
change your display for changing encoding by your web-browser directly...

> Regards,
> Adrian Sutton.

Hope to be helpful,


View raw message