jakarta-oro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel F. Savarese" <...@savarese.org>
Subject Re: DO NOT REPLY [Bug 8298] New: - default split of empty string should return empty list
Date Fri, 19 Apr 2002 19:02:17 GMT

In message <20020419154327.9831.qmail@nagoya.betaversion.org>, bugzilla@apache.
org writes:
>List tokenList = new ArrayList();
>new Perl5Util.split(tokenList,"");
>if( 0 != 
>tokenList.length())
>   System.out.println("aiee!  this doesn't work like any perl I 
>know!");
...
>The javadoc on this is quite clear, and states that it should match the 
>functionality of split(/\s+/,"") , which returns an empty list on all of the v
>ersions of perl I 
>could find back to 5.003

How do people feel about this one (other than the example isn't a full
example and uses non-existent methods like List.length(), significantly
wasting my time)?  It's true that Perl's split does return an empty
list when you split a zero-length string.  So I think there's no way
around it but to add the conditional if(input.length() > 0) near
the end of Perl5Util.split().  However, Perl's behavior does not produce
what would be the most commonly expected result, so I am reluctant to
change Util.split() and would rather document its different behavior.

To me, when you split a string on a non-matching pattern, you should
get the original string back, whether or not the string is zero-length.
BTW, the behavior in Perl5Util and Util was fully intended based on this
rationale of consistent behavior.  However, the special case was not
tested against Perl and the Perl description of split in 'man perlfunc'
does not make any explicit reference to the special treatement of
zero-length strings.  I would almost venture that Perl's behavior is
perhaps an unintended consequence of the implementation, except that,
based on this bug report, people seem to rely on this behavior.  However,
the Camel book used to (in the Perl 4 days; don't know about now) say
that "If the PATTERN doesn't match at all, it returns the original string."
which is consistent with my original interpretation.  But,
according to 'man perlfunc' "By default, empty leading fields
are preserved, and empty trailing ones are deleted," which would result
in a zero-length string input being deleted and an empty list as a result
because the empty field is both leading and trailing.

Now, this also raises another issue.  If you split a string on itself
e.g.,
  perl.split(list, "/foo/", "foo");
with either Perl5Util or Util, you'll wind up with a list of two empty
strings.  One at the front and one at the back.  This behavior was also
fully intended.  However, Perl gives you an empty list, based on the
"... leading fields are preserved and trailing ones are deleted." which
makes all of the empty fields both leading and trailing.  Again, I
feel this is unexpected behavior because if you do
  @foo = split('/f/', 'fffoo');
in Perl, you wind up with an array of length 4 with three leading zero-length
strings.  The whole leading vs. trailing fields is artificial.

My recommendation is to change Perl5Util.split() to fully conform with
Perl because that's what it claims to do and Perl5Util is intended to
match Perl's behavior, however inconsistent it may be.  I would also
recommend leaving Util.split() alone and documenting these added
differences.  Util.split() should be an implementation of how a
self-consistent split() with no surprises "should be."

What does everyone think?

daniel



--
To unsubscribe, e-mail:   <mailto:oro-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:oro-dev-help@jakarta.apache.org>


Mime
View raw message