commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Inger, Matthew" <>
Subject RE: [lang] [Bug 22692] - StringUtils.split ignores empty items
Date Fri, 14 Nov 2003 23:10:40 GMT
I've uploaded the source and test files.  The inner
class is named FastCharSet, since it's no longer strictly
for delimiters, and it's pretty fast (though it's very simple
and does only the basic thing we need to do).  Please review
and commit the source code please.  It might be worthwhile to
breakout the FastCharSet class into it's own class, but i'll
leave that up to whoever commits it.

-----Original Message-----
From: Inger, Matthew []
Sent: Friday, November 14, 2003 5:46 PM
To: 'Jakarta Commons Developers List'
Subject: RE: [lang] [Bug 22692] - StringUtils.split ignores empty items

I see what you mean.  It appears, as robust as CharSet it, is
does way too much, and is slow for what we need it for.

I'm going back to DelimiterSet, but rather than an interface,
it will be an inner class with several constructors:

	public DelimiterSet(char[]);
      public DelimiterSet(String);
      public DelimiterSet(char);

and two useful methods:

	public boolean contains(char);
      public char[] getChars();

This will be an immutable object.  The
constructor sorts the character array
using Arrays.sort, and the contains method
uses Arrays.binarySearch.  This should give
us a pretty efficient algorithm for the
contains method.  There's also a predefined
whitespace delimiter set "WHITESPACE_DELIMITERSET"
so people don't have to construct their own
all the time.

-----Original Message-----
From: Stephen Colebourne []
Sent: Friday, November 14, 2003 5:26 PM
To: Jakarta Commons Developers List
Subject: Re: [lang] [Bug 22692] - StringUtils.split ignores empty items

An interesting idea, although the performance would be very poor without
some effort in the CharSet class.

From: "Todd V. Jonker" <>
> Or just use lang.CharSet
> On Fri, 14 Nov 2003 16:58:45 -0500, "Inger, Matthew" <>
> said:
> > What about an interface:
> >
> > public class DelimitedTokenizer {
> >
> >    public static interface DelimiterSet {
> >        public boolean isDelimiter(char c);
> >    }
> > }
> >
> > and having the ability to pass in this
> > interface.  Of course, we'd still have a
> > single char version as well, so someone
> > might pass either a single char or an implementation
> > of this interface as the delimiter.  I suppose I could
> > do the same thing for quotes, but i find that less useful.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message