harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: UTF-8 decode broken for supplementary characters?
Date Wed, 01 Sep 2010 12:46:19 GMT
On Wed, Sep 1, 2010 at 5:43 AM, Deven You <devyoudw@gmail.com> wrote:

> I have run the test on Linux, and got the same error. Seems it is due to
> our
> UTF-8 decoder. I will do more debugging to narrow down the root cause. Any
> one is familiar with UTF-8? I hope I can get some help.
>
>
Looks like the problem is in UTF_8's decodeLoop where it does:

cArr[outIndex++] = (char) jchar;

and similar in the non-array case where it does:

out.put((char) jchar);

in this case, jchar is the correct value of my codepoint (0x1d11e), but is
being truncated to 'char'. instead it needs to be split into surrogates.

-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message