harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry M. Kononov" <dmitry.m.kono...@gmail.com>
Subject Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset
Date Thu, 06 Apr 2006 16:40:45 GMT
Hi Richard,

On 4/6/06, Richard Liang <richard.liangyx@gmail.com> wrote:

> Dmitry M. Kononov wrote:
> > As you exactly noticed the cause of this issue that Harmony uses the
> > little-endian byte order, if an encoded UTF-16 sequence has no
> byte-order
> > mark. However, the spec reads such a case explicitly as follows:
> >
> > "When decoding, the UTF-16 charset interprets a byte-order mark to
> indicate
> > the byte order of the stream but defaults to big-endian if there is no
> > byte-order mark; when encoding, it uses big-endian byte order and writes
> a
> > big-endian byte-order mark."
> >
> >
> Hello Dmitry,
>
> Yes, although Harmony and RI use different byte order, as both Harmony
> and RI use byte-order mark (U+FEFF), I think both Harmony and RI are
> compliant with the specification. So could we regard Harmony-308 as "not
> a bug"?


I think Harmony's behavior in this case is inconsistent with the java spec,
since the spec defines the expected behavior explicitly:
"when encoding, it uses big-endian byte order and writes a big-endian
byte-order mark." But Harmony's encode() returns bytes in the little-endian
order.

It seems I do not understand why do you think Harmony follows the spec
correctly in this case? :)
I am really sorry for my misunderstanding.

>From a test case attached to the HARMONY-308:

1) We have a char array that has no byte-order mark:
    private static final char chars[] = {

0x041b,0x0435,0x0442,0x043e,0x0020,0x0432,0x0020,0x0420,0x043e,0x0441,
        0x0441,0x0438,0x0438};

2) We have a byte array that encode() should return as we expect.
    private static final byte bytes[] = {
        (byte)254,(byte)255,(byte)  4,(byte) 27,(byte)  4,(byte) 53,(byte)
4,
        (byte) 66,(byte)  4,(byte) 62,(byte)  0,(byte) 32,(byte)  4,(byte)
50,
        (byte)  0,(byte) 32,(byte)  4,(byte) 32,(byte)  4,(byte) 62,(byte)
4,
        (byte) 65,(byte)  4,(byte) 65,(byte)  4,(byte) 56,(byte)  4,(byte)
56};

Please note, according to the spec we expect bytes returned by encode() in
big-endian byte order. So, we expect the FEFF byte-order mark.
Do you agree this expectation is correct and consistent with the spec?

Thanks.
--
Dmitry M. Kononov
Intel Managed Runtime Division

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message