Return-Path: Mailing-List: contact regexp-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list regexp-dev@jakarta.apache.org Received: (qmail 32891 invoked from network); 26 Feb 2001 11:04:24 -0000 Received: from mta5-rme.xtra.co.nz (203.96.92.17) by h31.sny.collab.net with SMTP; 26 Feb 2001 11:04:24 -0000 Received: from gholam.localdomain ([210.55.36.220]) by mta5-rme.xtra.co.nz with SMTP id <20010226110404.WYBB6921040.mta5-rme.xtra.co.nz@gholam.localdomain> for ; Tue, 27 Feb 2001 00:04:04 +1300 From: Michael McCallum Reply-To: gholam@xtra.co.nz To: regexp-dev@jakarta.apache.org Subject: Re: Bugs I've found Date: Mon, 26 Feb 2001 11:07:14 +0000 X-Mailer: KMail [version 1.1.99] Content-Type: text/plain; charset="US-ASCII" References: In-Reply-To: MIME-Version: 1.0 Message-Id: <01022611071401.00643@gholam.localdomain> Content-Transfer-Encoding: 8bit X-Spam-Rating: h31.sny.collab.net 1.6.2 0/1000/N On Friday 23 February 2001 23:16, you wrote: } I've found two bugs recently in regexp. I'm new to the list, so I } apologize if these are known issues. New solutions are always good for comparison. Esp when untainted by the previous ones. (Like the prime directive :) } } 1) RECompiler dies when compiling regular expressions with '*?(' } sequence of characters in the regexp. Sometimes the next offset of a node } has not been set to zero, so when next = node + instruction[node + } offsetNext], next is very large, and you get an arrayoutOfBounds } exception. I added a check to make sure there was no array out of bounds } case, and returned -1 in that case. It appears to work, but there may be } a more correct way to fix this bug. I fixed this by making sure the nextOfEnd did not go past the list of currently defined nodes. } } 2) The other problem is with reluctant closures. Because reluctant } closures are not recursive, cases like the following fail: b(aaa|aaaaa)*?b } does not accept baaaaaaaaaab (10 a's), when it should. I have tried to } change around reluctant closures so they're implemented more similarly to } greedy ones(with recursive or's), but I don't have it working yet. I noticed when looking at this that the greedy and non-greedy closures were implemented differently. Was not sure why. Do you think you can get the recursive or's working? Because of the current implementation of the nongreedy closures you get infinite loops generated I stoped this by not allowing the loop to be created but if youve fixed the non-greedy closures then I can get rid of that hack. Send a patch for the fixes you came up with. I'll put them in. Michael