harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anton Ivanov" <ant...@gmail.com>
Subject Re: [classlib][regex|luni] build break
Date Thu, 12 Oct 2006 11:46:59 GMT
The problem is in the RI. These failures are the RI bugs.

The test failures on the RI you pointed out can be grouped into the two
categories:

1. Canonical equivalence related.

java.util.regex.PatternSyntaxException: Unclosed group near index 59
(?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
^
at java.util.regex.Pattern.error(Pattern.java:1650)
at java.util.regex.Pattern.accept(Pattern.java:1508)
at java.util.regex.Pattern.group0(Pattern.java:2460)
at java.util.regex.Pattern.sequence(Pattern.java:1715)
at java.util.regex.Pattern.expr(Pattern.java:1687)
at java.util.regex.Pattern.compile(Pattern.java:1397)
at java.util.regex.Pattern.<init>(Pattern.java:1124)
at java.util.regex.Pattern.compile(Pattern.java:840)
at
org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
PatternTest.java:1060)

The RI fails to compile the following pattern with CANON_EQ flag specified:
       "\u01E0\u00CCcdb(ac)"
This is due to the RI tries to build alternations to take into account
canonical equivalence.
And the RI does so in simple cases. But if pattern is a little more
complex the RI fails to compile it.
So the RI builds these alternations wrong.
You can see the following bug:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4867170

I wrote about these test failures on the RI here:
http://issues.apache.org/jira/browse/HARMONY-933

2. Supplementary Unicode codepoints related.

For example let's see at:

testPredefinedClassesWithSurrogatesSupplementary
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertFalse(Assert.java:34)
at junit.framework.Assert.assertFalse(Assert.java:41)
at
org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary
(PatternTest.java:1477)

Here we try to find surrogate character in a codepoint \uD916\uDE27.
It is written here:
http://www.unicode.org/reports/tr18/#Supplementary_Characters

"Surrogate pairs (or their equivalents in other encoding forms) are be
handled internally as single code point values"

So we have to treat text as code points not code units.
Here \uD916\uDE27 is a one code point consisting of
two code units (two surrogate characters) so we find nothing.
(I added a comment with this explanation to the
testPredefinedClassesWithSurrogatesSupplementary()).
But the RI doesn't treat this codepoint as a single whole, this is the RI
bug
and this is wrong according to the technical report.

3. Error messages
java.util.regex.PatternSyntaxException: unmatched ) near index: 1
b)a
^
java.util.regex.PatternSyntaxException: unmatched ) near index: 4
bcde)a
^
java.util.regex.PatternSyntaxException: unmatched ) near index: 5
bbg())a
^
java.util.regex.PatternSyntaxException: unmatched ) near index: 7
cdb(?i))a
^
are printed in the testCompileStringint().
This test is needed to verify that appropriate exceptions are thrown
if we compile a wrong builded regular expression.

Thanks,
Anton

On 10/12/06, Spark Shen <smallsmallorgan@gmail.com> wrote:
>
> Anton Ivanov 写道:
> > On 10/10/06, Anton Ivanov <antiva@gmail.com> wrote:
> >>
> >>
> >>
> >> On 10/10/06, Tim Ellison <t.p.ellison@gmail.com> wrote:
> >> >
> >> > So I checked in a patch for HARMONY-688's regex fix, and it passed
> the
> >> > regex unit tests, but causes the existing luni tests to fail in
> >> > java.util.Scanner. I've not figured out the base cause of the failure
> >> > so I've backed out the changes.
> >> >
> >> > Regards,
> >> > Tim
> >> >
> >> > --
> >> >
> >> > Tim Ellison (t.p.ellison@gmail.com )
> >> > IBM Java technology centre, UK.
> >> >
> >> > ---------------------------------------------------------------------
> >> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> >> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> >> > For additional commands, e-mail:
> harmony-dev-help@incubator.apache.org
> >>
> >>
> >>
> >>
> >>
> >> This is my patch.
> >> I'll look into this problem and try to correct the patch.
> >>
> >> Thanks,
> >> Anton
> >>
> > There was a bug in the newly created class SupplRangeSet.java.
> > There was the following code in the method matches() of
> > SupplRangeSet.java:
> > ...
> > if (stringIndex < strLength) {
> > char high = testString.charAt(stringIndex++);
> >
> > if (contains(high) &&
> > next.matches(stringIndex, testString, matchResult) > 0)
> > {
> > return 1;
> > }
> > ...
> > But it is wrong simply to return 1, though we can read about method
> > matches() in AbstractSet.java comments:
> >
> > "Checks if this node matches in given position and recursively call
> > next node matches on positive self match. Returns positive integer if
> > entire match succeed, negative otherwise
> > return -1 if match fails or n > 0;"
> > In fact method matches() returns not only a positive n > 0. The n is an
> > offset in case of a positive
> > match attempt. This fact is took into account in all old classes of
> > java.util.regex, but I forgot this fact in SupplRangeSet.java
> > So I corrected method matches() of the SupplRangeSet class as follows:
> > ...
> > int offset = -1;
> > if (stringIndex < strLength) {
> > char high = testString.charAt(stringIndex++);
> >
> > if (contains(high) &&
> > (offset = next.matches(stringIndex, testString,
> > matchResult)) > 0) {
> > return offset;
> > }
> > ...
> > I corrected the patch and attached it to the issue.
> > I verified that regex and luni tests pass normally with the patch
> > applied.
> >
> > Thanks,
> > Anton
> >
> Hi Anton:
> It must be very excited to handle such a complex problem. :-)
>
> But after applying the new patch (and test patch applied), I still got
> problems:
> Of test class: org.apache.harmony.tests.java.util.regex.PatternTest, 4
> test methods fail on RI:
> testCanonEqFlag:
> java.util.regex.PatternSyntaxException: Unclosed group near index 59
> (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
> ^
> at java.util.regex.Pattern.error(Pattern.java:1650)
> at java.util.regex.Pattern.accept(Pattern.java:1508)
> at java.util.regex.Pattern.group0(Pattern.java:2460)
> at java.util.regex.Pattern.sequence(Pattern.java:1715)
> at java.util.regex.Pattern.expr(Pattern.java:1687)
> at java.util.regex.Pattern.compile(Pattern.java:1397)
> at java.util.regex.Pattern.<init>(Pattern.java:1124)
> at java.util.regex.Pattern.compile(Pattern.java:840)
> at
> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
> PatternTest.java:1060)
>
> testIndexesCanonicalEq:
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:47)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at
>
> org.apache.harmony.tests.java.util.regex.PatternTest.testIndexesCanonicalEq
> (PatternTest.java:1247)
>
> testCanonEqFlagWithSupplementaryCharacters:
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:47)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at
>
> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlagWithSupplementaryCharacters
> (PatternTest.java:1275)
>
> testPredefinedClassesWithSurrogatesSupplementary
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:47)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertFalse(Assert.java:34)
> at junit.framework.Assert.assertFalse(Assert.java:41)
> at
>
> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary
> (PatternTest.java:1477)
> If they are the bugs of RI, shall we add comments for them in the test
> case?
>
> and Error message printed out on console on Harmony. Since there are
> test cases use System.out instead of assert, I could not locate where
> these error message comes from:
> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
> b)a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
> bcde)a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
> bbg())a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
> cdb(?i))a
> ^
> And last, the good news is luni tests do pass. :-)
>
> Best regards
>
> --
> Spark Shen
> China Software Development Lab, IBM
>
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message