harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulex Yang <paulex.y...@gmail.com>
Subject Re: [classlib][regex|luni] build break
Date Thu, 12 Oct 2006 12:32:18 GMT
Anton Ivanov wrote:
> The problem is in the RI. These failures are the RI bugs.
>
> The test failures on the RI you pointed out can be grouped into the two
I guess you meant three ;-)
> categories:
Is category2, the supplemental character issue, included in the 
HARMONY-933? How about to document the details like below on that JIRA, 
and mark it as non-bug difference?
>
> 1. Canonical equivalence related.
>
> java.util.regex.PatternSyntaxException: Unclosed group near index 59
> (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
> ^
> at java.util.regex.Pattern.error(Pattern.java:1650)
> at java.util.regex.Pattern.accept(Pattern.java:1508)
> at java.util.regex.Pattern.group0(Pattern.java:2460)
> at java.util.regex.Pattern.sequence(Pattern.java:1715)
> at java.util.regex.Pattern.expr(Pattern.java:1687)
> at java.util.regex.Pattern.compile(Pattern.java:1397)
> at java.util.regex.Pattern.<init>(Pattern.java:1124)
> at java.util.regex.Pattern.compile(Pattern.java:840)
> at
> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
> PatternTest.java:1060)
>
> The RI fails to compile the following pattern with CANON_EQ flag 
> specified:
>       "\u01E0\u00CCcdb(ac)"
> This is due to the RI tries to build alternations to take into account
> canonical equivalence.
> And the RI does so in simple cases. But if pattern is a little more
> complex the RI fails to compile it.
> So the RI builds these alternations wrong.
> You can see the following bug:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4867170
>
> I wrote about these test failures on the RI here:
> http://issues.apache.org/jira/browse/HARMONY-933
>
> 2. Supplementary Unicode codepoints related.
>
> For example let's see at:
>
> testPredefinedClassesWithSurrogatesSupplementary
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:47)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertFalse(Assert.java:34)
> at junit.framework.Assert.assertFalse(Assert.java:41)
> at
> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary

>
> (PatternTest.java:1477)
>
> Here we try to find surrogate character in a codepoint \uD916\uDE27.
> It is written here:
> http://www.unicode.org/reports/tr18/#Supplementary_Characters
>
> "Surrogate pairs (or their equivalents in other encoding forms) are be
> handled internally as single code point values"
>
> So we have to treat text as code points not code units.
> Here \uD916\uDE27 is a one code point consisting of
> two code units (two surrogate characters) so we find nothing.
> (I added a comment with this explanation to the
> testPredefinedClassesWithSurrogatesSupplementary()).
> But the RI doesn't treat this codepoint as a single whole, this is the RI
> bug
> and this is wrong according to the technical report.
>
> 3. Error messages
> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
> b)a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
> bcde)a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
> bbg())a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
> cdb(?i))a
> ^
> are printed in the testCompileStringint().
> This test is needed to verify that appropriate exceptions are thrown
> if we compile a wrong builded regular expression.
>
> Thanks,
> Anton
>
> On 10/12/06, Spark Shen <smallsmallorgan@gmail.com> wrote:
>>
>> Anton Ivanov 写道:
>> > On 10/10/06, Anton Ivanov <antiva@gmail.com> wrote:
>> >>
>> >>
>> >>
>> >> On 10/10/06, Tim Ellison <t.p.ellison@gmail.com> wrote:
>> >> >
>> >> > So I checked in a patch for HARMONY-688's regex fix, and it passed
>> the
>> >> > regex unit tests, but causes the existing luni tests to fail in
>> >> > java.util.Scanner. I've not figured out the base cause of the 
>> failure
>> >> > so I've backed out the changes.
>> >> >
>> >> > Regards,
>> >> > Tim
>> >> >
>> >> > --
>> >> >
>> >> > Tim Ellison (t.p.ellison@gmail.com )
>> >> > IBM Java technology centre, UK.
>> >> >
>> >> > 
>> ---------------------------------------------------------------------
>> >> > Terms of use : http://incubator.apache.org/harmony/mailing.html
>> >> > To unsubscribe, e-mail: 
>> harmony-dev-unsubscribe@incubator.apache.org
>> >> > For additional commands, e-mail:
>> harmony-dev-help@incubator.apache.org
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> This is my patch.
>> >> I'll look into this problem and try to correct the patch.
>> >>
>> >> Thanks,
>> >> Anton
>> >>
>> > There was a bug in the newly created class SupplRangeSet.java.
>> > There was the following code in the method matches() of
>> > SupplRangeSet.java:
>> > ...
>> > if (stringIndex < strLength) {
>> > char high = testString.charAt(stringIndex++);
>> >
>> > if (contains(high) &&
>> > next.matches(stringIndex, testString, matchResult) > 0)
>> > {
>> > return 1;
>> > }
>> > ...
>> > But it is wrong simply to return 1, though we can read about method
>> > matches() in AbstractSet.java comments:
>> >
>> > "Checks if this node matches in given position and recursively call
>> > next node matches on positive self match. Returns positive integer if
>> > entire match succeed, negative otherwise
>> > return -1 if match fails or n > 0;"
>> > In fact method matches() returns not only a positive n > 0. The n 
>> is an
>> > offset in case of a positive
>> > match attempt. This fact is took into account in all old classes of
>> > java.util.regex, but I forgot this fact in SupplRangeSet.java
>> > So I corrected method matches() of the SupplRangeSet class as follows:
>> > ...
>> > int offset = -1;
>> > if (stringIndex < strLength) {
>> > char high = testString.charAt(stringIndex++);
>> >
>> > if (contains(high) &&
>> > (offset = next.matches(stringIndex, testString,
>> > matchResult)) > 0) {
>> > return offset;
>> > }
>> > ...
>> > I corrected the patch and attached it to the issue.
>> > I verified that regex and luni tests pass normally with the patch
>> > applied.
>> >
>> > Thanks,
>> > Anton
>> >
>> Hi Anton:
>> It must be very excited to handle such a complex problem. :-)
>>
>> But after applying the new patch (and test patch applied), I still got
>> problems:
>> Of test class: org.apache.harmony.tests.java.util.regex.PatternTest, 4
>> test methods fail on RI:
>> testCanonEqFlag:
>> java.util.regex.PatternSyntaxException: Unclosed group near index 59
>> (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
>> ^
>> at java.util.regex.Pattern.error(Pattern.java:1650)
>> at java.util.regex.Pattern.accept(Pattern.java:1508)
>> at java.util.regex.Pattern.group0(Pattern.java:2460)
>> at java.util.regex.Pattern.sequence(Pattern.java:1715)
>> at java.util.regex.Pattern.expr(Pattern.java:1687)
>> at java.util.regex.Pattern.compile(Pattern.java:1397)
>> at java.util.regex.Pattern.<init>(Pattern.java:1124)
>> at java.util.regex.Pattern.compile(Pattern.java:840)
>> at
>> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
>> PatternTest.java:1060)
>>
>> testIndexesCanonicalEq:
>> junit.framework.AssertionFailedError: null
>> at junit.framework.Assert.fail(Assert.java:47)
>> at junit.framework.Assert.assertTrue(Assert.java:20)
>> at junit.framework.Assert.assertTrue(Assert.java:27)
>> at
>>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testIndexesCanonicalEq 
>>
>> (PatternTest.java:1247)
>>
>> testCanonEqFlagWithSupplementaryCharacters:
>> junit.framework.AssertionFailedError: null
>> at junit.framework.Assert.fail(Assert.java:47)
>> at junit.framework.Assert.assertTrue(Assert.java:20)
>> at junit.framework.Assert.assertTrue(Assert.java:27)
>> at
>>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlagWithSupplementaryCharacters

>>
>> (PatternTest.java:1275)
>>
>> testPredefinedClassesWithSurrogatesSupplementary
>> junit.framework.AssertionFailedError: null
>> at junit.framework.Assert.fail(Assert.java:47)
>> at junit.framework.Assert.assertTrue(Assert.java:20)
>> at junit.framework.Assert.assertFalse(Assert.java:34)
>> at junit.framework.Assert.assertFalse(Assert.java:41)
>> at
>>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary

>>
>> (PatternTest.java:1477)
>> If they are the bugs of RI, shall we add comments for them in the test
>> case?
>>
>> and Error message printed out on console on Harmony. Since there are
>> test cases use System.out instead of assert, I could not locate where
>> these error message comes from:
>> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
>> b)a
>> ^
>> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
>> bcde)a
>> ^
>> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
>> bbg())a
>> ^
>> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
>> cdb(?i))a
>> ^
>> And last, the good news is luni tests do pass. :-)
>>
>> Best regards
>>
>> -- 
>> Spark Shen
>> China Software Development Lab, IBM
>>
>>
>> ---------------------------------------------------------------------
>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>
>>


-- 
Paulex Yang
China Software Development Lab
IBM



---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Mime
View raw message