harmony-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anton Ivanov (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HARMONY-688) java.util.regex.Matcher does not support Unicode supplementary characters
Date Thu, 12 Oct 2006 13:27:53 GMT
    [ http://issues.apache.org/jira/browse/HARMONY-688?page=comments#action_12441730 ] 
            
Anton Ivanov commented on HARMONY-688:
--------------------------------------

Some tests may not pass while executing on reference implementation 
but they have to pass according to the Unicode specification. 

For example while trying to pass PatternTest on the RI you can get the following 
test failure: 

testPredefinedClassesWithSurrogatesSupplementary
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertFalse(Assert.java:34)
at junit.framework.Assert.assertFalse(Assert.java:41)
at
org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary
(PatternTest.java:1477)
 
Here we try to find surrogate character in a codepoint \uD916\uDE27.
It is written here:
http://www.unicode.org/reports/tr18/#Supplementary_Characters
 
"Surrogate pairs (or their equivalents in other encoding forms) are be handled internally
as single code point values"
 
So we have to treat text as code points not code units.
Here \uD916\uDE27 is a one code point consisting of 
two code units (two surrogate characters) so we find nothing.
But the RI doesn't treat this codepoint as a single whole, this is the RI bug 
and this is wrong according to the technical report.

This issue is a right candidate to mark as non bug difference.

Thanks,
Anton

> java.util.regex.Matcher does not support Unicode supplementary characters
> -------------------------------------------------------------------------
>
>                 Key: HARMONY-688
>                 URL: http://issues.apache.org/jira/browse/HARMONY-688
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>            Reporter: Richard Liang
>         Assigned To: Tim Ellison
>         Attachments: patch_src.txt, patch_src_corrected.txt, patch_tests.txt
>
>
> Hello Nikolay,
> The following test case pass on RI, but fail on Harmony.  Would you please have a look
at this issue? Thanks a lot.
>     public void test_matcher() {
>         Pattern p = Pattern.compile("\\p{javaLowerCase}");
>         Matcher matcher = p.matcher("\uD801\uDC28");
>         assertTrue(matcher.find());
>     }
> Best regards,
> Richard

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message