harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Liang <richard.lian...@gmail.com>
Subject Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset
Date Thu, 06 Apr 2006 04:37:13 GMT
Dmitry M. Kononov (JIRA) wrote:
>      [ http://issues.apache.org/jira/browse/HARMONY-308?page=all ]
>
> Dmitry M. Kononov updated HARMONY-308:
> --------------------------------------
>
>     Attachment: test9.java
>
>   
>> java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in
Harmony and RI for the UTF-16 charset
>> -----------------------------------------------------------------------------------------------------------------------
>>
>>          Key: HARMONY-308
>>          URL: http://issues.apache.org/jira/browse/HARMONY-308
>>      Project: Harmony
>>         Type: Bug
>>     
>
>   
>>   Components: Classlib
>>     Reporter: Dmitry M. Kononov
>>  Attachments: test9.java
>>
>> java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order.
>> Please look at the output of a test case that I am going to attach.
>> RI:
>> ---8<---
>> bb.order()=BE
>> cb.order()=LE
>> result.order()=BE
>> The result is
>> result = java.nio.HeapByteBuffer[pos=0 lim=28 cap=52]
>> bb = java.nio.HeapByteBuffer[pos=0 lim=28 cap=28]
>> The result is OK.
>> ---8<---
>> Harmony (At revision 391577):
>> ---8<---
>> bb.order()=BE
>> cb.order()=LE
>> result.order()=BE
>> The result is
>> result = java.nio.ReadWriteHeapByteBuffer, status: capacity=28 position=0 limit=28
>> bb = java.nio.ReadWriteHeapByteBuffer, status: capacity=28 position=0 limit=28
>> The result is not correct.
>> 0 elements are not equal (ffffffff != fffffffe)
>> 1 elements are not equal (fffffffe != ffffffff)
>> 2 elements are not equal (1b != 4)
>> 3 elements are not equal (4 != 1b)
>> 4 elements are not equal (35 != 4)
>> 5 elements are not equal (4 != 35)
>> 6 elements are not equal (42 != 4)
>> 7 elements are not equal (4 != 42)
>> 8 elements are not equal (3e != 4)
>> 9 elements are not equal (4 != 3e)
>> 10 elements are not equal (20 != 0)
>> 11 elements are not equal (0 != 20)
>> 12 elements are not equal (32 != 4)
>> 13 elements are not equal (4 != 32)
>> 14 elements are not equal (20 != 0)
>> 15 elements are not equal (0 != 20)
>> 16 elements are not equal (20 != 4)
>> 17 elements are not equal (4 != 20)
>> 18 elements are not equal (3e != 4)
>> 19 elements are not equal (4 != 3e)
>> 20 elements are not equal (41 != 4)
>> 21 elements are not equal (4 != 41)
>> 22 elements are not equal (41 != 4)
>> 23 elements are not equal (4 != 41)
>> 24 elements are not equal (38 != 4)
>> 25 elements are not equal (4 != 38)
>> 26 elements are not equal (38 != 4)
>> 27 elements are not equal (4 != 38)
>> ---8<---
>>     
>
>   
Hello Dmitry,

IMHO, you may mix up the two "byte order" concepts :-)

1. the byte order of ByteBuffer (ByteBuffer.order)
2. the byte order of byte sequences encoded by some CharsetEncoder, such 
as UTF-16

First, let's see the byte order for java.nio.ByteBuffer.

As described in the spec of java.nio.ByteBuffer:

This class defines six categories of operations upon byte buffers:
....
* Absolute and relative *get* and *put *methods that read and write 
values of other primitive types, translating them to and from sequences 
of bytes in a particular byte order;
.....

For example,
        ByteBuffer bb = ByteBuffer.allocate(10);
        bb.order(ByteOrder.LITTLE_ENDIAN);
        bb.putChar('A');
The bytes stored in the ByteBuffer will be: 41 00

        ByteBuffer bb = ByteBuffer.allocate(10);
        bb.order(ByteOrder.BIG_ENDIAN);
        bb.putChar('A');
The bytes stored in the ByteBuffer will be: 00 41

Second, there are also byte order issues in some character encoding 
schemes, such as UTF-16, UTF-16LE and UTF-16BE.

For example,

Character 'A' can be encoded in UTF-16LE: 41 00
Character 'A' can be encoded in UTF-16BE: 00 41

If we use the APIs java.nio.charset, the encoded byte sequences will be 
saved into a ByteBuffer. But **here** the ByteBuffer.order has no 
relationship to the encoded byte sequences. A UTF-16LE encoded byte 
sequence can still be stored into a BIG_ENDIAN ordered ByteBuffer.

And as described in Unioccde, UTF-16 can be encoded as either big endian 
or little endian, but a leading byte sequence corresponding to U+FEFF 
will be used to distinguish the two byte orders.

If the leading byte sequence is FE FF, the whole byte sequence will be 
regarded as big-endian
If the leading byte sequence is FF FE, the whole byte sequence will be 
regarded as little-endian.

 From your test, we can see Harmony use little-endian, while RI use 
big-endian.

I'm sorry if my explanation make you confused :-)

-- 
Richard Liang
China Software Development Lab, IBM 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message