hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sung-Gu" <jeri...@apache.org>
Subject Re: The use of UTIUtil.toUsingCharset?
Date Mon, 27 Jan 2003 05:01:07 GMT
Hi,

I'm sorry that I wasn't reaching your point...
You're interested in only single-byte encodings with Unicode.
I haven't realized it...

That's why you haven't seen the correct use and display of that method.
I guessed so though. (So, I tried to display byte code values)

And I'd like to comment you that your below examples're not
correct to use...   They're meaning-less...
For display (what you want I guess), you should use code set
or char set supported by your operating system or ISO-8859-1.
For UTF-8 is capable to use only by purposes of transformation
for storage and transmission.
The case you want to use Unicode for display, ISO-10464 is
fully supported and transformation to UTF-8 should be applied
from UCS....

I made it as TODO comment for simple diagram 2 in text file.
 It was not my right previous issue.
(As you know, I'm intersted in double-byte encodings...
 and it would be general way to solve character encoding)
I'll do it sometime later...

Sung-Gu

----- Original Message -----
From: <o.kalnichevski@dplanet.ch>
Subject: Re: The use of UTIUtil.toUsingCharset?


Please take no offense, but URIUtil.toUsingCharset method still does not
make even slightest sense to me. Your example shows how to invoke this
method but does not explain what it is useful for, apart from garbling
unicode strings

Have a look at a simpler example. Here I attempt to (supposedly) convert
"Zürich" from one encoding into another. However, as you can see
URIUtil.toUsingCharset() always produces garbage

===================================================================
public static void main(String[] args) throws Exception
{
  System.out.println(
    URIUtil.toUsingCharset("Zürich", "UTF-8", "US-ASCII"));
  System.out.println(
    URIUtil.toUsingCharset("Zürich", "ASCII", "UTF-8"));
  System.out.println(
    URIUtil.toUsingCharset("Zürich", "UTF-8", "ISO-8859-1"));
  System.out.println(
    URIUtil.toUsingCharset("Zürich", "ISO-8859-1", "UTF-8"));
}


Output:

Z��rich
Z?rich
ZÃ&#131;¼rich
Z�

=================================================================

Java uses 16 bit to represent characters. Therefore the concept of character
encoding is only applicable when working with arrays of bytes, 8 bit units,
that represent a sequence of characters. One indeed needs to take character
encoding into account when converting from byte[] to String or visa versa.
However, converting from Unicode String to an array of bytes to a Unicode
String using different encoding (especially in one method call), in my
opinion, does not produce any sensible results.

If you see things differently, please help me understand what
URIUtil.toUsingCharset() can be USEFUL for

Cheers

Oleg

--
To unsubscribe, e-mail:
<mailto:commons-httpclient-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:commons-httpclient-dev-help@jakarta.apache.org>

Mime
View raw message