harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry M. Kononov" <dmitry.m.kono...@gmail.com>
Subject Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset
Date Thu, 06 Apr 2006 12:45:55 GMT
Hi Richard,

On 4/6/06, Richard Liang <richard.liangyx@gmail.com> wrote:
> And as described in Unioccde, UTF-16 can be encoded as either big endian
> or little endian, but a leading byte sequence corresponding to U+FEFF
> will be used to distinguish the two byte orders.
> If the leading byte sequence is FE FF, the whole byte sequence will be
> regarded as big-endian
> If the leading byte sequence is FF FE, the whole byte sequence will be
> regarded as little-endian.
> From your test, we can see Harmony use little-endian, while RI use
> big-endian.
> I'm sorry if my explanation make you confused :-)

I absolutely agreed with you. Thanks a lot for your explanation and sorry
for my brief description of the issue.

As you exactly noticed the cause of this issue that Harmony uses the
little-endian byte order, if an encoded UTF-16 sequence has no byte-order
mark. However, the spec reads such a case explicitly as follows:

"When decoding, the UTF-16 charset interprets a byte-order mark to indicate
the byte order of the stream but defaults to big-endian if there is no
byte-order mark; when encoding, it uses big-endian byte order and writes a
big-endian byte-order mark."


> --
> Dmitry M. Kononov
> Intel Managed Runtime Division

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message