harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Liang <richard.lian...@gmail.com>
Subject Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset
Date Fri, 07 Apr 2006 12:10:07 GMT
Dmitry M. Kononov wrote:
> Hi Richard,
>
> On 4/6/06, Richard Liang <richard.liangyx@gmail.com> wrote:
>
>   
>> Dmitry M. Kononov wrote:
>>     
>>> As you exactly noticed the cause of this issue that Harmony uses the
>>> little-endian byte order, if an encoded UTF-16 sequence has no
>>>       
>> byte-order
>>     
>>> mark. However, the spec reads such a case explicitly as follows:
>>>
>>> "When decoding, the UTF-16 charset interprets a byte-order mark to
>>>       
>> indicate
>>     
>>> the byte order of the stream but defaults to big-endian if there is no
>>> byte-order mark; when encoding, it uses big-endian byte order and writes
>>>       
>> a
>>     
>>> big-endian byte-order mark."
>>>
>>>
>>>       
>> Hello Dmitry,
>>
>> Yes, although Harmony and RI use different byte order, as both Harmony
>> and RI use byte-order mark (U+FEFF), I think both Harmony and RI are
>> compliant with the specification. So could we regard Harmony-308 as "not
>> a bug"?
>>     
>
>
> I think Harmony's behavior in this case is inconsistent with the java spec,
> since the spec defines the expected behavior explicitly:
> "when encoding, it uses big-endian byte order and writes a big-endian
> byte-order mark." But Harmony's encode() returns bytes in the little-endian
> order.
>
> It seems I do not understand why do you think Harmony follows the spec
> correctly in this case? :)
> I am really sorry for my misunderstanding.
>
>   
You're Dmitry. :-) Now I agree with you that Harmony is not compliant 
with the specification. We will discuss with our Charset Provider - ICU 
to determine how to fix this issue. Thanks a lot.

> >From a test case attached to the HARMONY-308:
>
> 1) We have a char array that has no byte-order mark:
>     private static final char chars[] = {
>
> 0x041b,0x0435,0x0442,0x043e,0x0020,0x0432,0x0020,0x0420,0x043e,0x0441,
>         0x0441,0x0438,0x0438};
>
> 2) We have a byte array that encode() should return as we expect.
>     private static final byte bytes[] = {
>         (byte)254,(byte)255,(byte)  4,(byte) 27,(byte)  4,(byte) 53,(byte)
> 4,
>         (byte) 66,(byte)  4,(byte) 62,(byte)  0,(byte) 32,(byte)  4,(byte)
> 50,
>         (byte)  0,(byte) 32,(byte)  4,(byte) 32,(byte)  4,(byte) 62,(byte)
> 4,
>         (byte) 65,(byte)  4,(byte) 65,(byte)  4,(byte) 56,(byte)  4,(byte)
> 56};
>
> Please note, according to the spec we expect bytes returned by encode() in
> big-endian byte order. So, we expect the FEFF byte-order mark.
> Do you agree this expectation is correct and consistent with the spec?
>
> Thanks.
> --
> Dmitry M. Kononov
> Intel Managed Runtime Division
>
>   


-- 
Richard Liang
China Software Development Lab, IBM 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message