jakarta-oro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 9556] - Subgroup wrong when matching (.)(?=(.)) against "XY"?
Date Sat, 01 Jun 2002 00:39:17 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9556>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9556

Subgroup wrong when matching (.)(?=(.)) against "XY"?

dfs@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED



------- Additional Comments From dfs@apache.org  2002-06-01 00:39 -------
Perl does fill $2 with Y.  And so does Perl5Matcher.  If you look at
the group offsets, you'll find the matching is performed correctly.  In
other words, two groups are found and the begin and end offsets of the
second group are 1 and 2.  However, because the matched group was a
zero-width lookahead assertion, the Y character is not consumed and not
considered part of the full match.  So the full match is just 'X'.  Since
the full match stored in the MatchResult is 'X', offsets that exceed the
length of the match result in empty strings.  To show that the full
match is just 'X' in Perl, look at group 0 here:

~> perl -e '"xy" =~ /(.)(?=(.))/; print "0: $& 1: $1 2: $2 3: $3\n";'
0: x 1: x 2: y 3: 

Now, the problem we're faced with is one that I'm not sure how to deal
with.  Is the capturing of the lookahead assertion an undefined
side-effect, much as some situations involving the capturing of
repetitions used to be before Perl 5.6?  Or is it intended for a
groups that match outside of the full match to be saved?  It's
actually quite tricky to implement this without either maintaining
a reference to a copy of the entire original input (undesirable) or
screwing up a lot of other cases.  I'd like to mull this one over
for a while.  In the meantime, the workaround is to use the group
offset information to extract the appropriate substring from the
input.

--
To unsubscribe, e-mail:   <mailto:oro-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:oro-dev-help@jakarta.apache.org>


Mime
View raw message