tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Barker" <>
Subject Re: cvs commit: jakarta-tomcat RELEASE-PLAN-3.3.1.txt
Date Mon, 04 Feb 2002 08:00:20 GMT

----- Original Message -----
From: <>
To: "Tomcat Developers List" <>
Sent: Sunday, February 03, 2002 10:36 PM
Subject: Re: cvs commit: jakarta-tomcat RELEASE-PLAN-3.3.1.txt

> On Sat, 2 Feb 2002, Bill Barker wrote:
> > >   +        4416  URI En/Decoding not working
> > >   +              (investigate and fix if feasible)
> > My vote is for LATER, since as I understand the bug it is too late to
> > this well, and  the fix (if not done right) has the potential to create
> > security problems.  The fix is to basically flip UEncoder on it's head,
> > work with "un-safe chars" instead of "safe chars" (as well as to add the
> > logic to use the encoding).  If Costin (since it's his baby) thinks he's
> > to it, by all means go for it.  I just don't want to delay the release
> > the amount of time it would take me to make and be comfortable with the
> > (esp. since there is a work-around already).
> I'm not sure I understand - the bug seems to be about
> DecodeInterceptor using 8859_1 for decoding, even if a different
> decoding was found.
> I don't think it is touching UEncoder and the url encoding/decoding.
> The url decoding has nothing to do with the charset - we decode
> %xx as bytes, the url encoding happens after char->byte and decoding
> happen before byte->char conversions ( i.e. uencoding operates on
> bytes ).
My understanding of this is that if the request is for:
then most of the time Tomcat will read it correctly. But it will return for
The "safe chars" map to the same code points under iso-latin-1 and utf-8
(that's why they are "safe chars").  UEncoder is strict in what is safe, but
the RFC isn't.  You are allowed to use exteded chars if the other side is
capable of detecting the charset.
> It is possible we have a bug - and a test case would help finding it. The
> code is quite tricky ( I spent huge amounts of time with charset/encoding

> issues ), and I agree LATER is good given the risks. But if I have
> the test case, I can take a look, it may be a simple fix.
> The way it is supposed to work - first the bytes are url decoded,
> then we detect the charset, then convert bytes to chars.
> Am I missing something here ?
> Costin
> --
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message