jakarta-regexp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 14954] New: - A bug caused by '-' in char class def ('[...]')
Date Fri, 29 Nov 2002 11:05:59 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14954>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14954

A bug caused by '-' in char class def ('[...]')

           Summary: A bug caused by '-' in char class def ('[...]')
           Product: Regexp
           Version: unspecified
          Platform: Other
        OS/Version: Other
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: Other
        AssignedTo: regexp-dev@jakarta.apache.org
        ReportedBy: ikuya@flab.fujitsu.co.jp


When I put a '-' in a character class definition ('[...]'), there are some
cases that a simple char in the definition is ignored. In such cases,
instructions in REProgram objects are not as expected. This may be related to
the bugs #2121 and #5212. 

For example, '[a-zA]' works fine, while for '[Aa-z]', 'A' is ignored, and
for '[abcd\-]', 'd' is ignored. The point is that the ignored char is at
2-chars before '-'.

Near Line 710 in RECompiler.java, we can see:
>                  // If simple character and not start of range, include it
>                 if ((idx + 1) >= len || pattern.charAt(idx + 1) != '-')
>                  {
>                     range.include(simpleChar, include);
>                  }
In my understanding, idx is pointing the next char of the simpleChar in
question. The simpleChar should not be included when its next char (if any)
is '-' (in that case, the simpleChar turns to be a start of a new range.)
Therefore, the following code seems correct:
>                 if (idx >= len || pattern.charAt(idx) != '-')

I tried this fix on the CVS'ed source tree last night, with some new testcases,
and it worked fine. I'm not sure there is no side effect of this; at least all
tests in RETest.txt are still successful.

The diff output follows. Does this help?

Ikuya


Index: docs/RETest.txt
===================================================================
RCS file: /home/cvspublic/jakarta-regexp/docs/RETest.txt,v
retrieving revision 1.3
diff -c -r1.3 RETest.txt
*** docs/RETest.txt     27 Feb 2001 08:37:05 -0000      1.3
--- docs/RETest.txt     28 Nov 2002 14:22:25 -0000
***************
*** 1011,1014 ****
--- 1011,1030 ----
  YES
  aaabc

+ #168
+ [a-zA]+
+ JakartaAnt
+ YES
+ akartaAnt

+ #169
+ [Aa-z]+
+ JakartaAnt
+ YES
+ akartaAnt
+
+ #170
+ [akrt\-]+
+ Jakarta-Ant
+ YES
+ akarta-
Index: src/java/org/apache/regexp/RECompiler.java
===================================================================
RCS file: /home/cvspublic/jakarta-
regexp/src/java/org/apache/regexp/RECompiler.java,v
retrieving revision 1.4
diff -c -r1.4 RECompiler.java
*** src/java/org/apache/regexp/RECompiler.java  27 Feb 2001 08:37:05 -0000     
 1.4
--- src/java/org/apache/regexp/RECompiler.java  28 Nov 2002 14:22:26 -0000
***************
*** 710,716 ****
              else
              {
                  // If simple character and not start of range, include it
!                 if ((idx + 1) >= len || pattern.charAt(idx + 1) != '-')
                  {
                      range.include(simpleChar, include);
                  }
--- 710,716 ----
              else
              {
                  // If simple character and not start of range, include it
!                 if (idx >= len || pattern.charAt(idx) != '-')
                  {
                      range.include(simpleChar, include);
                  }

--
To unsubscribe, e-mail:   <mailto:regexp-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:regexp-dev-help@jakarta.apache.org>


Mime
View raw message