commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Gregory" <ggreg...@seagullsoftware.com>
Subject RE: [lang] Tokenizer
Date Mon, 30 Aug 2004 22:13:33 GMT
> -----Original Message-----
> From: Stephen Colebourne [mailto:scolebourne@btopenworld.com]
> Sent: Saturday, August 28, 2004 05:05
> To: Jakarta Commons Developers List
> Subject: [lang] Tokenizer
> 
> I took a look at Tokenizer this morning and fixed some issues. Others
need
> some discussion...
> 
> a) Tokenizer has constructors and methods that take a char[] to
process as
> the source text. This char[] is cloned as it is input. It seems that
the
> main reason why someone would use the char[] method as opposed to
String
> is
> to get faster performance and avoid cloning.
> 
> I propose the cloning is removed. The class becomes less thread-safe,
but
> then it shouldn't be used that way anyway.

+1. 

The whole char[] aspect of the class seems suspiciously like an
optimization made before measurement. It is hard to tell since there are
no comments as to why we need both char[] AND String constructors.

> 
> 
> b) Tokenizer uses a Matcher to spot characters. It seems like this
could
> be
> too restrictive, what if you want a String delimiter.
> 
> I propose to change Matcher to be
>    int isMatch(char[] text, int textLen, int pos)
> Matcher implementations can then check against a string, or could even
do
> context based tests, by querying backwards/forwards in the string. PS.
I
> have coded this, and it does work.
> 
> c) Should we add a PairedMatcher to Tokenizer? This would handle
> a=b,c=d,e=f  type strings returning a then b then c... using the
first,
> third, fifth delimiter as an equals, but the second, fourth,...
delimter
> as
> a comma. Is this a common enough format to warrant a class/method?

I would say go XP on this one. When someone needs, add it then.

> 
> 
> I did wonder whether it might be better to create a commons-format at
this
> point. It could contain Tokenizer and Interpolator. The trouble is
what
> happens to FastDateFormat or DurationFormat? In the end, I felt it
would
> be
> more confusing, we just need to control the formats we create.

The boundaries of [lang] have been getting blurry. We are no longer
"covering" java.lang. StringTokenizer is in java.util and we "cover" it
in [lang]. I claim Interpolator "covers" java.text.MessageFormat which
is yet in another package. We have a boat load of String routines in
StringUtils. If we all agree that [lang] should do all things String
then covering 

For Interpolator, I do not like the same, in a different thread I
proposed some alternatives like MappedMessageFormat as a better version
of java.text.MessageFormat. 

Gary
> 
> Stephen
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message