geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vamsavardhana Reddy" <c1vams...@gmail.com>
Subject Re: Any character encoding experts out there?
Date Fri, 24 Feb 2006 05:02:19 GMT
Hi Rick,

See the byte to character map at
http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx .

Char 0u0081 is not mapped to any byte in Cp1252.  So String.getBytes("Cp1252")
is returning a '?' which has byte value decimal 63.  The same will happen
result with any char that is not mapped.

Regards,
Vamsi

On 2/23/06, Rick McGuire <rickmcg@gmail.com> wrote:
>
> I'm currently trying to sort out a problem with my implementation of the
> MimeUtility class in the javamail specs.  For the
> encodeWord()/decodeWord() methods, my encoding encodes to the same value
> as the Sun implementation, but the decoding is driving me nuts.  I'm
> able to successfully decode this into what should be the correct byte[]
> array, but when used to instantiate the String value, I'm getting a
> bogus character value.
>
> Playing around with this, I've discovered that the problem seems to be
> occurring with the String constructor, and can be demonstrated without
> using the javamail code at all.  Here is a little snippet that shows the
> problem:
>
>        String longString = "Yada, yada\u0081";
>
>         try {
>             byte[] bytes = longString.getBytes("Cp1252");      // get
> the bytes using CP1252
>
>             String newString = new String(bytes, 0, bytes.length,
> "Cp1252");   // create a new string item using the same code page.
>
>             // last char of original is int 129, last char of rebuilt
> string is int 63.
>             System.out.println(">>>>> original string = " + longString
+
> " rebuilt string = " + newString);
>             System.out.println(">>>>> original string = " +
> (int)longString.charAt(longString.length() - 1) + " rebuilt string = " +
> (int)newString.charAt(longString.length() - 1));
>         } catch (Exception e) {
>         }
>
> 63 is the last value in the byte array after the getBytes() call, and
> the Sun impl of MimeUtility.encodeWord() returns the string
> "=?Cp1252?Q?Yada,_yada=3F?=" (0x3F == 63), so the correct value is
> getting extracted.  I'm at a loss to figure out why the round trip coded
> above is corrupting the data.  What am I missing here?
>
> Rick
>
>

Mime
View raw message