cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruno Dumon <br...@outerthought.org>
Subject Re: NetUtils / StringUtils / Tokenizer
Date Thu, 29 Apr 2004 13:07:36 GMT
On Thu, 2004-04-29 at 14:44, Stefano Mazzocchi wrote:
> Joerg Heinicke wrote:
> > After Cheche's bug report I had a look at NetUtils. Though Ugo fixed it 
> > with a quick fix, it does not really solve the problem as a test with 
> > the updated NetUtilsTestCase and the NetUtils before my latest commit 
> > can easily show.
> > 
> > My problem is that there are many problems in different places in the code:
> > 
> > 1. o.a.commons.lang.StringUtils.split() behaves strange in that way that 
> > o.a.com.l.SU.split("", "/") return an empty String[]
> > while
> > o.a.coc.u.SU.split("", "/") returns String[] {""} which is what I would 
> > expect.
> > Also
> > o.a.com.l.SU.split("/", "/") returns empty String[].
> > This means the iteration over the string array can never really work 
> > with o.a.commons.lang.StringUtils.
> > 
> > => reverted that change.
> > 
> > 
> > 2. our Tokenizer behaves strange in one case:
> > 
> > o.a.coc.u.SU.split("/", "/")
> > => new Tokenizer("/", "/", false)
> > => tokenizer.countTokens() = 2
> > => but tokenizer.hasMoreTokens() returns true only once
> > => leads to String[] {"", null} while it should be String[] {"", ""}
> > 
> > 
> > 3. while the Tokenizer correctly works on "/../" the 
> > NetUtils.normalize() method has a problem with it:
> > 
> > result of split/tokenize: String[] {"", "..", ""}
> > result of normalize(): ""
> 
> What about using ORO regexp for these things? When I first wrote I was 
> not really much into regular expressions, but now I do think it makes 
> sense to use them, these guys spent a lot of time making sure those 
> things don't happen.

There are lots of similar utility functions in the class SourceUtil of
the excalibur sourceresolver. There are probably all kinds of subtle
differences though.

I remember changing the code doing the URL parsing there using a regexp,
but this gave stackoverflowexceptions on long URL's (I think it was
using jakarta-regexp), and many people considered using a regexp for
such critical code not a good idea.

My advice: in the 2.1 series, leave that code as is, especially as we're
close to a release. It's too easy to brake it.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


Mime
View raw message