commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Colebourne" <>
Subject Re: [lang] Tokenizer
Date Mon, 30 Aug 2004 11:15:49 GMT
Any views on this??

----- Original Message -----
From: "Stephen Colebourne" <>
> I took a look at Tokenizer this morning and fixed some issues. Others need
> some discussion...
> a) Tokenizer has constructors and methods that take a char[] to process as
> the source text. This char[] is cloned as it is input. It seems that the
> main reason why someone would use the char[] method as opposed to String
> to get faster performance and avoid cloning.
> I propose the cloning is removed. The class becomes less thread-safe, but
> then it shouldn't be used that way anyway.
> b) Tokenizer uses a Matcher to spot characters. It seems like this could
> too restrictive, what if you want a String delimiter.
> I propose to change Matcher to be
>    int isMatch(char[] text, int textLen, int pos)
> Matcher implementations can then check against a string, or could even do
> context based tests, by querying backwards/forwards in the string. PS. I
> have coded this, and it does work.
> c) Should we add a PairedMatcher to Tokenizer? This would handle
> a=b,c=d,e=f  type strings returning a then b then c... using the first,
> third, fifth delimiter as an equals, but the second, fourth,... delimter
> a comma. Is this a common enough format to warrant a class/method?
> I did wonder whether it might be better to create a commons-format at this
> point. It could contain Tokenizer and Interpolator. The trouble is what
> happens to FastDateFormat or DurationFormat? In the end, I felt it would
> more confusing, we just need to control the formats we create.
> Stephen
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message