hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André-John Mas <aj...@newtrade.com>
Subject Re: [VOTE] Re: 2.0 release - deprecate some methods?
Date Thu, 26 Jun 2003 15:23:42 GMT
This doesn't look correct, if you are really wanting to convert
from one charset to another then you would have to do something
such as:

    String myString = new String(bytes,bytesCharset);
    byte[] bytes2 = myString.getBytes(destCharset);

Until you have the bytes, you don't have the final output, since
strings will be affected by the platformas native encoding if
you aren't careful. Otherwise if your destination is an outputstream, 
then let the OutputWriter do the work for you:

    String myString = new String(bytes,bytesCharset);
    OutputStreamWriter out = new
        OutputStreamWriter(outStream, destCharset)
    out.write(myString);

I have just had to write a project that is fully UTF-8 compliant
and it taught me a lot about what Java does. Without any encoding
specified the string conversion default to the platform native
format, which is not what you always want. I had to go everywhere
and make sure the right conversions were being performed.

regards

Andre

Laura Werner wrote:

> Adrian Sutton wrote:
> 
>> The flaw in the toUsingCharset method is two-fold:
>> Firstly, Strings in Java are *always* stored internally as UTF-8
> 
> 
> 
> I agree with the rest of your analysis of this, but I thought I should 
> point out that Java Strings and "char"s are stored in UTF-16 rather than 
> UTF-8.  A "char" is an unsigned, two-byte value that can hold all the 
> characters from UCS2.
> 
> As far as toUsingCharset goes, I agree that it looks broken.  The code 
> basically does:
> 
>            return new String(target.getBytes(fromCharset), toCharset);
> 
> It's taking "target", which is a UTF-16 string, encoding it into a byte 
> array in "fromCharset", and then decoding those bytes back into UTF-16 
> using "toCharset".  So it's pretendeing the bytes in the array have two 
> different meanings, one when it writes them and one when it reads them 
> immediately afterward.  I can't see how this could be correct.
> 
> -- Laura
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: 
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: 
> commons-httpclient-dev-help@jakarta.apache.org
> 
> 


-- 
André-John Mas
Software Developer / Développeur Informatique
Newtrade Technologies
63 de Brésoles, Suite 100, Montreal, Quebec, Canada H2Y 1V7
mailto:ajmas@newtrade.com
tel +1 514 286-8187 x3017
fax +1 514 221-3287

----------------------------------------------------------------------
If you have received this message in error, please notify the sender
immediately and delete the original without making a copy, disclosing
its contents or taking any action based thereon.

Si vous avez reçu ce message par erreur, veuillez en aviser
immédiatement le signataire et effacer l'original, sans en tirer de
copie, en dévoiler le contenu ni prendre quelque mesure fondée sur
celui-ci.



Mime
View raw message