harmony-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anton Ivanov (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HARMONY-688) java.util.regex.Matcher does not support Unicode supplementary characters
Date Wed, 11 Oct 2006 12:39:28 GMT
     [ http://issues.apache.org/jira/browse/HARMONY-688?page=all ]

Anton Ivanov updated HARMONY-688:
---------------------------------

    Attachment: patch_src_corrected.txt

I corrected the patch (patch_src.txt) and attached it to the issue (patch_src_corrected.txt).
I verified that regex and luni tests pass normally with the patch applied. 

There was a bug in the newly created class SupplRangeSet.java.
There was the following code in the method matches() of SupplRangeSet.java:

...
        if (stringIndex < strLength) {            
            char high = testString.charAt(stringIndex++);
            
            if (contains(high) && 
                    next.matches(stringIndex, testString, matchResult) > 0) {
                return 1;
            }
...

But it is wrong simply to return 1, though we can read about method matches() in AbstractSet.java
comments: 

 "Checks if this node matches in given position and recursively call
  next node matches on positive self match. Returns positive integer if 
  entire match succeed, negative otherwise
  return -1 if match fails or n > 0;"

In fact method matches() returns not only a positive n > 0. The n is an offset in case
of a positive
match attempt. This fact is took into account in all old classes of java.util.regex, but I
forgot this fact in SupplRangeSet.java
So I corrected method matches() of the SupplRangeSet class as follows:

...
        int offset = -1;

        if (stringIndex < strLength) {            
            char high = testString.charAt(stringIndex++);
            
            if (contains(high) && 
                    (offset = next.matches(stringIndex, testString, matchResult)) > 0)
{
                return offset;
            }
...

Thanks,
Anton

> java.util.regex.Matcher does not support Unicode supplementary characters
> -------------------------------------------------------------------------
>
>                 Key: HARMONY-688
>                 URL: http://issues.apache.org/jira/browse/HARMONY-688
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>            Reporter: Richard Liang
>         Assigned To: Tim Ellison
>         Attachments: patch_src.txt, patch_src_corrected.txt, patch_tests.txt
>
>
> Hello Nikolay,
> The following test case pass on RI, but fail on Harmony.  Would you please have a look
at this issue? Thanks a lot.
>     public void test_matcher() {
>         Pattern p = Pattern.compile("\\p{javaLowerCase}");
>         Matcher matcher = p.matcher("\uD801\uDC28");
>         assertTrue(matcher.find());
>     }
> Best regards,
> Richard

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message