commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Gregory" <>
Subject RE: [lang] Tokenizer
Date Mon, 30 Aug 2004 22:13:33 GMT
> -----Original Message-----
> From: Stephen Colebourne []
> Sent: Saturday, August 28, 2004 05:05
> To: Jakarta Commons Developers List
> Subject: [lang] Tokenizer
> I took a look at Tokenizer this morning and fixed some issues. Others
> some discussion...
> a) Tokenizer has constructors and methods that take a char[] to
process as
> the source text. This char[] is cloned as it is input. It seems that
> main reason why someone would use the char[] method as opposed to
> is
> to get faster performance and avoid cloning.
> I propose the cloning is removed. The class becomes less thread-safe,
> then it shouldn't be used that way anyway.


The whole char[] aspect of the class seems suspiciously like an
optimization made before measurement. It is hard to tell since there are
no comments as to why we need both char[] AND String constructors.

> b) Tokenizer uses a Matcher to spot characters. It seems like this
> be
> too restrictive, what if you want a String delimiter.
> I propose to change Matcher to be
>    int isMatch(char[] text, int textLen, int pos)
> Matcher implementations can then check against a string, or could even
> context based tests, by querying backwards/forwards in the string. PS.
> have coded this, and it does work.
> c) Should we add a PairedMatcher to Tokenizer? This would handle
> a=b,c=d,e=f  type strings returning a then b then c... using the
> third, fifth delimiter as an equals, but the second, fourth,...
> as
> a comma. Is this a common enough format to warrant a class/method?

I would say go XP on this one. When someone needs, add it then.

> I did wonder whether it might be better to create a commons-format at
> point. It could contain Tokenizer and Interpolator. The trouble is
> happens to FastDateFormat or DurationFormat? In the end, I felt it
> be
> more confusing, we just need to control the formats we create.

The boundaries of [lang] have been getting blurry. We are no longer
"covering" java.lang. StringTokenizer is in java.util and we "cover" it
in [lang]. I claim Interpolator "covers" java.text.MessageFormat which
is yet in another package. We have a boat load of String routines in
StringUtils. If we all agree that [lang] should do all things String
then covering 

For Interpolator, I do not like the same, in a different thread I
proposed some alternatives like MappedMessageFormat as a better version
of java.text.MessageFormat. 

> Stephen
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message