harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Spark Shen <smallsmallor...@gmail.com>
Subject Re: [classlib][regex|luni] build break
Date Fri, 13 Oct 2006 01:55:20 GMT
Hi Anton:

There are still two problems here:

1. Error message printed out on Harmony in console:
java.util.regex.PatternSyntaxException: unmatched ) near index: 1
 b)a
 ^
 java.util.regex.PatternSyntaxException: unmatched ) near index: 4
 bcde)a
 ^
 java.util.regex.PatternSyntaxException: unmatched ) near index: 5
 bbg())a
 ^
 java.util.regex.PatternSyntaxException: unmatched ) near index: 7
 cdb(?i))a

2. Some test cases in PatternTest simply uses System.out.println() 
instead of assertion, this way failed test cases
can not be easily find out using JUnit output

Best regards

Anton Ivanov 写道:
> I documented the details on both JIRA issues:
> http://issues.apache.org/jira/browse/HARMONY-688
> http://issues.apache.org/jira/browse/HARMONY-933
> So, please mark these issues as non-bug-differences if needed.
>
> Thanks,
> Anton
>
> On 10/12/06, Paulex Yang <paulex.yang@gmail.com> wrote:
>>
>> Anton Ivanov wrote:
>> > The problem is in the RI. These failures are the RI bugs.
>> >
>> > The test failures on the RI you pointed out can be grouped into the 
>> two
>> I guess you meant three ;-)
>> > categories:
>> Is category2, the supplemental character issue, included in the
>> HARMONY-933? How about to document the details like below on that JIRA,
>> and mark it as non-bug difference?
>> >
>> > 1. Canonical equivalence related.
>> >
>> > java.util.regex.PatternSyntaxException: Unclosed group near index 59
>> > (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
>> > ^
>> > at java.util.regex.Pattern.error(Pattern.java:1650)
>> > at java.util.regex.Pattern.accept(Pattern.java:1508)
>> > at java.util.regex.Pattern.group0(Pattern.java:2460)
>> > at java.util.regex.Pattern.sequence(Pattern.java:1715)
>> > at java.util.regex.Pattern.expr(Pattern.java:1687)
>> > at java.util.regex.Pattern.compile(Pattern.java:1397)
>> > at java.util.regex.Pattern.<init>(Pattern.java:1124)
>> > at java.util.regex.Pattern.compile(Pattern.java:840)
>> > at
>> > org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
>> > PatternTest.java:1060)
>> >
>> > The RI fails to compile the following pattern with CANON_EQ flag
>> > specified:
>> >       "\u01E0\u00CCcdb(ac)"
>> > This is due to the RI tries to build alternations to take into account
>> > canonical equivalence.
>> > And the RI does so in simple cases. But if pattern is a little more
>> > complex the RI fails to compile it.
>> > So the RI builds these alternations wrong.
>> > You can see the following bug:
>> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4867170
>> >
>> > I wrote about these test failures on the RI here:
>> > http://issues.apache.org/jira/browse/HARMONY-933
>> >
>> > 2. Supplementary Unicode codepoints related.
>> >
>> > For example let's see at:
>> >
>> > testPredefinedClassesWithSurrogatesSupplementary
>> > junit.framework.AssertionFailedError: null
>> > at junit.framework.Assert.fail(Assert.java:47)
>> > at junit.framework.Assert.assertTrue(Assert.java:20)
>> > at junit.framework.Assert.assertFalse(Assert.java:34)
>> > at junit.framework.Assert.assertFalse(Assert.java:41)
>> > at
>> >
>> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary

>>
>> >
>> > (PatternTest.java:1477)
>> >
>> > Here we try to find surrogate character in a codepoint \uD916\uDE27.
>> > It is written here:
>> > http://www.unicode.org/reports/tr18/#Supplementary_Characters
>> >
>> > "Surrogate pairs (or their equivalents in other encoding forms) are be
>> > handled internally as single code point values"
>> >
>> > So we have to treat text as code points not code units.
>> > Here \uD916\uDE27 is a one code point consisting of
>> > two code units (two surrogate characters) so we find nothing.
>> > (I added a comment with this explanation to the
>> > testPredefinedClassesWithSurrogatesSupplementary()).
>> > But the RI doesn't treat this codepoint as a single whole, this is the
>> RI
>> > bug
>> > and this is wrong according to the technical report.
>> >
>> > 3. Error messages
>> > java.util.regex.PatternSyntaxException: unmatched ) near index: 1
>> > b)a
>> > ^
>> > java.util.regex.PatternSyntaxException: unmatched ) near index: 4
>> > bcde)a
>> > ^
>> > java.util.regex.PatternSyntaxException: unmatched ) near index: 5
>> > bbg())a
>> > ^
>> > java.util.regex.PatternSyntaxException: unmatched ) near index: 7
>> > cdb(?i))a
>> > ^
>> > are printed in the testCompileStringint().
>> > This test is needed to verify that appropriate exceptions are thrown
>> > if we compile a wrong builded regular expression.
>> >
>> > Thanks,
>> > Anton
>> >
>> > On 10/12/06, Spark Shen <smallsmallorgan@gmail.com> wrote:
>> >>
>> >> Anton Ivanov 写道:
>> >> > On 10/10/06, Anton Ivanov <antiva@gmail.com> wrote:
>> >> >>
>> >> >>
>> >> >>
>> >> >> On 10/10/06, Tim Ellison <t.p.ellison@gmail.com> wrote:
>> >> >> >
>> >> >> > So I checked in a patch for HARMONY-688's regex fix, and it

>> passed
>> >> the
>> >> >> > regex unit tests, but causes the existing luni tests to fail
in
>> >> >> > java.util.Scanner. I've not figured out the base cause of
the
>> >> failure
>> >> >> > so I've backed out the changes.
>> >> >> >
>> >> >> > Regards,
>> >> >> > Tim
>> >> >> >
>> >> >> > --
>> >> >> >
>> >> >> > Tim Ellison (t.p.ellison@gmail.com )
>> >> >> > IBM Java technology centre, UK.
>> >> >> >
>> >> >> >
>> >> ---------------------------------------------------------------------
>> >> >> > Terms of use : http://incubator.apache.org/harmony/mailing.html
>> >> >> > To unsubscribe, e-mail:
>> >> harmony-dev-unsubscribe@incubator.apache.org
>> >> >> > For additional commands, e-mail:
>> >> harmony-dev-help@incubator.apache.org
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> This is my patch.
>> >> >> I'll look into this problem and try to correct the patch.
>> >> >>
>> >> >> Thanks,
>> >> >> Anton
>> >> >>
>> >> > There was a bug in the newly created class SupplRangeSet.java.
>> >> > There was the following code in the method matches() of
>> >> > SupplRangeSet.java:
>> >> > ...
>> >> > if (stringIndex < strLength) {
>> >> > char high = testString.charAt(stringIndex++);
>> >> >
>> >> > if (contains(high) &&
>> >> > next.matches(stringIndex, testString, matchResult) > 0)
>> >> > {
>> >> > return 1;
>> >> > }
>> >> > ...
>> >> > But it is wrong simply to return 1, though we can read about method
>> >> > matches() in AbstractSet.java comments:
>> >> >
>> >> > "Checks if this node matches in given position and recursively call
>> >> > next node matches on positive self match. Returns positive 
>> integer if
>> >> > entire match succeed, negative otherwise
>> >> > return -1 if match fails or n > 0;"
>> >> > In fact method matches() returns not only a positive n > 0. The
n
>> >> is an
>> >> > offset in case of a positive
>> >> > match attempt. This fact is took into account in all old classes of
>> >> > java.util.regex, but I forgot this fact in SupplRangeSet.java
>> >> > So I corrected method matches() of the SupplRangeSet class as
>> follows:
>> >> > ...
>> >> > int offset = -1;
>> >> > if (stringIndex < strLength) {
>> >> > char high = testString.charAt(stringIndex++);
>> >> >
>> >> > if (contains(high) &&
>> >> > (offset = next.matches(stringIndex, testString,
>> >> > matchResult)) > 0) {
>> >> > return offset;
>> >> > }
>> >> > ...
>> >> > I corrected the patch and attached it to the issue.
>> >> > I verified that regex and luni tests pass normally with the patch
>> >> > applied.
>> >> >
>> >> > Thanks,
>> >> > Anton
>> >> >
>> >> Hi Anton:
>> >> It must be very excited to handle such a complex problem. :-)
>> >>
>> >> But after applying the new patch (and test patch applied), I still 
>> got
>> >> problems:
>> >> Of test class: 
>> org.apache.harmony.tests.java.util.regex.PatternTest, 4
>> >> test methods fail on RI:
>> >> testCanonEqFlag:
>> >> java.util.regex.PatternSyntaxException: Unclosed group near index 59
>> >> (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
>> >> ^
>> >> at java.util.regex.Pattern.error(Pattern.java:1650)
>> >> at java.util.regex.Pattern.accept(Pattern.java:1508)
>> >> at java.util.regex.Pattern.group0(Pattern.java:2460)
>> >> at java.util.regex.Pattern.sequence(Pattern.java:1715)
>> >> at java.util.regex.Pattern.expr(Pattern.java:1687)
>> >> at java.util.regex.Pattern.compile(Pattern.java:1397)
>> >> at java.util.regex.Pattern.<init>(Pattern.java:1124)
>> >> at java.util.regex.Pattern.compile(Pattern.java:840)
>> >> at
>> >> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
>> >> PatternTest.java:1060)
>> >>
>> >> testIndexesCanonicalEq:
>> >> junit.framework.AssertionFailedError: null
>> >> at junit.framework.Assert.fail(Assert.java:47)
>> >> at junit.framework.Assert.assertTrue(Assert.java:20)
>> >> at junit.framework.Assert.assertTrue(Assert.java:27)
>> >> at
>> >>
>> >>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testIndexesCanonicalEq 
>>
>> >>
>> >> (PatternTest.java:1247)
>> >>
>> >> testCanonEqFlagWithSupplementaryCharacters:
>> >> junit.framework.AssertionFailedError: null
>> >> at junit.framework.Assert.fail(Assert.java:47)
>> >> at junit.framework.Assert.assertTrue(Assert.java:20)
>> >> at junit.framework.Assert.assertTrue(Assert.java:27)
>> >> at
>> >>
>> >>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlagWithSupplementaryCharacters

>>
>> >>
>> >> (PatternTest.java:1275)
>> >>
>> >> testPredefinedClassesWithSurrogatesSupplementary
>> >> junit.framework.AssertionFailedError: null
>> >> at junit.framework.Assert.fail(Assert.java:47)
>> >> at junit.framework.Assert.assertTrue(Assert.java:20)
>> >> at junit.framework.Assert.assertFalse(Assert.java:34)
>> >> at junit.framework.Assert.assertFalse(Assert.java:41)
>> >> at
>> >>
>> >>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary

>>
>> >>
>> >> (PatternTest.java:1477)
>> >> If they are the bugs of RI, shall we add comments for them in the 
>> test
>> >> case?
>> >>
>> >> and Error message printed out on console on Harmony. Since there are
>> >> test cases use System.out instead of assert, I could not locate where
>> >> these error message comes from:
>> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
>> >> b)a
>> >> ^
>> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
>> >> bcde)a
>> >> ^
>> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
>> >> bbg())a
>> >> ^
>> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
>> >> cdb(?i))a
>> >> ^
>> >> And last, the good news is luni tests do pass. :-)
>> >>
>> >> Best regards
>> >>
>> >> --
>> >> Spark Shen
>> >> China Software Development Lab, IBM
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> Terms of use : http://incubator.apache.org/harmony/mailing.html
>> >> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> >> For additional commands, e-mail: 
>> harmony-dev-help@incubator.apache.org
>> >>
>> >>
>>
>>
>> -- 
>> Paulex Yang
>> China Software Development Lab
>> IBM
>>
>>
>>
>> ---------------------------------------------------------------------
>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>
>>


-- 
Spark Shen
China Software Development Lab, IBM


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Mime
View raw message