harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Liang <richard.lian...@gmail.com>
Subject Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset
Date Thu, 06 Apr 2006 15:17:19 GMT
Dmitry M. Kononov wrote:
> Hi Richard,
> On 4/6/06, Richard Liang <richard.liangyx@gmail.com> wrote:
>> And as described in Unioccde, UTF-16 can be encoded as either big endian
>> or little endian, but a leading byte sequence corresponding to U+FEFF
>> will be used to distinguish the two byte orders.
>> If the leading byte sequence is FE FF, the whole byte sequence will be
>> regarded as big-endian
>> If the leading byte sequence is FF FE, the whole byte sequence will be
>> regarded as little-endian.
>> From your test, we can see Harmony use little-endian, while RI use
>> big-endian.
>> I'm sorry if my explanation make you confused :-)
> I absolutely agreed with you. Thanks a lot for your explanation and sorry
> for my brief description of the issue.
> As you exactly noticed the cause of this issue that Harmony uses the
> little-endian byte order, if an encoded UTF-16 sequence has no byte-order
> mark. However, the spec reads such a case explicitly as follows:
> "When decoding, the UTF-16 charset interprets a byte-order mark to indicate
> the byte order of the stream but defaults to big-endian if there is no
> byte-order mark; when encoding, it uses big-endian byte order and writes a
> big-endian byte-order mark."
Hello Dmitry,

Yes, although Harmony and RI use different byte order, as both Harmony 
and RI use byte-order mark (U+FEFF), I think both Harmony and RI are 
compliant with the specification. So could we regard Harmony-308 as "not 
a bug"?
> Thanks.
>> --
>> Dmitry M. Kononov
>> Intel Managed Runtime Division

Richard Liang
China Software Development Lab, IBM 

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message