harmony-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters
Date Thu, 02 Sep 2010 13:29:53 GMT

     [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated HARMONY-6640:
---------------------------------

    Attachment: HARMONY-6640.patch

Attach is an improved version of the patch i sent to the mailing list.

Below was the original simple test case i supplied:

 public void testUTF8() throws Exception {
    // U+1D11E: MUSICAL SYMBOL G CLEF
    String s = new StringBuilder().appendCodePoint(0x1D11E).toString();
    byte utf8[] = s.getBytes("UTF-8");
    assertEquals(s, new String(utf8, 0, utf8.length, "UTF-8"));
  }

I also ran round-trip tests with randomly generated strings... but I'm not
setup to build harmony on my machine, so I apologize for lack of a test
case in the actual patch.


> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>         Attachments: HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message