commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ola Berg <>
Subject Re: [lang] StringUtils containsOnly method
Date Thu, 22 Aug 2002 21:23:49 GMT
Sorry, my last mail started another thread, was intended to go here, as a case against dropping
char set/ char class handling. Clarification:

I see two reasons not to drop simple char class handling:

1) Efficiency. When munching large numbers of chars, the slight overhead caused by RE in each
iteration becomes a major overhead. A simple approach is needed sometimes. I created my char
handling stuff when I was optimizing different code bases that was using both ORO and Regexp
and String shuffling. I needed a streams based approach very much. 

Also, when doing servlets, the consumed memory for one operation must be multiplied with the
number of parallel sessions doing the same operation. In my last project we did heavy munching
with many parallel users. Regexps was to clumsy, both performance wise and memory footprint
wise. We had to drop some features when we skipped regexps but what could we do?

2) Char classes and logic handling them are now multiplied across different projects. But
char classes defined for one project would benefit other projects, hence the idea of all char
classing projects sharing the same char class interface and implementations is a good idea.

As for the question of how far one should go: I use

1) simple char class tests for text processing that could be done strictly sequential w/o
read-ahead or remebering state (such as read until whitespace, skip to next comma etc)

2) My parser package for everything that is solved sequentially, but with one char read ahead.

3) A regexp package or a tailored dedicated parser for anything more complicated than that.

Is that a good border between simple text parsing and full fledged regexps? I think that simple
text parsing should go into lang.


0733 - 99 99 17

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message