Return-Path: Mailing-List: contact regexp-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list regexp-dev@jakarta.apache.org Received: (qmail 84673 invoked from network); 23 Feb 2001 23:17:27 -0000 Received: from unknown (HELO mail.ispheres.com) (209.246.29.230) by h31.sny.collab.net with SMTP; 23 Feb 2001 23:17:27 -0000 Received: (qmail 23865 invoked by uid 1012); 23 Feb 2001 23:16:55 -0000 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 23 Feb 2001 23:16:55 -0000 Date: Fri, 23 Feb 2001 15:16:55 -0800 (PST) From: Ian Swett X-Sender: iswett@viceroy.i To: regexp-dev@jakarta.apache.org Subject: Bugs I've found Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Rating: h31.sny.collab.net 1.6.2 0/1000/N I've found two bugs recently in regexp. I'm new to the list, so I apologize if these are known issues. I wanted to notify the list of the problems I found, ensure they're actually problems, and make sure I'm going about solving them in the correct manner. 1) RECompiler dies when compiling regular expressions with '*?(' sequence of characters in the regexp. Sometimes the next offset of a node has not been set to zero, so when next = node + instruction[node + offsetNext], next is very large, and you get an arrayoutOfBounds exception. I added a check to make sure there was no array out of bounds case, and returned -1 in that case. It appears to work, but there may be a more correct way to fix this bug. 2) The other problem is with reluctant closures. Because reluctant closures are not recursive, cases like the following fail: b(aaa|aaaaa)*?b does not accept baaaaaaaaaab (10 a's), when it should. I have tried to change around reluctant closures so they're implemented more similarly to greedy ones(with recursive or's), but I don't have it working yet. Ian Swett