tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <>
Subject Re: [GitHub] [tomcat] alpire opened a new pull request #176: CoyoteAdapter: fix out-of-bounds read in checkNormalize
Date Mon, 01 Jul 2019 15:06:55 GMT
Hash: SHA256


On 6/29/19 16:01, Mark Thomas wrote:
> On 29/06/2019 02:26, GitBox wrote:
>> alpire opened a new pull request #176: CoyoteAdapter: fix
>> out-of-bounds read in checkNormalize URL:
>> On malformed requests, checkNormalize would throw an
>> ArrayIndexOutOfBoundsException leading to a 500 response. This
>> change fixes checkNormalize to return false instead of throwing
>> exception on those inputs, and adds a few tests to check the new
>> functionality.
> That there are URI sequences that can trigger this is certainly 
> something that we need to fix.
> On a slightly different topic, this got me thinking.
> checkNormalize() is a test that runs on every request. It is
> essentially there to ensure that whatever character set has been
> specified for the Connector is "safe". In this case "safe" means
> when the bytes are converted to characters we don't get any
> unexpected "." or "/" characters in the result that make the
> character version of the URI non-normalized.
> Rather than run this test on every request, we could: -
> pre-calculate the character sets we consider to be safe - throw an
> exception if the user attempts to configure the Connector to use
> one
> I think we could safely exclude any character set where any of the 
> following were not true:
> - 0x00 <-> '\u0000' - 0x2e <-> '.' - 0x2f <-> '/' - 0x5c <->
> We could create a unit test that maintains/checks the list of
> "safe" character set canonical names. After creating the character
> set in Connector.setURIEncoding(), if the canonical name of the
> resulting character set is not in the safe list, throw an
> exception.
> Then remove checkNormalize() and replace with a comment that
> explains why the conversion is known to be safe.
> There are several loops through the URI in checkNormalize(). I
> think - I'd need to test to confirm - that removing them would
> provide a measurable performance benefit.

I'm a little worried about edge cases that might cause a misread of a
multibyte character into several single-byte characters that aren't
supposed to be treated as such. Remember that an attacker is not
obligated to follow the rules of sane e.g. UTF-8 or SHIFT-JS or
whatever. They can claim one encoding and then send complete garbage
down the line. We need to remain safe in all cases.

So I think checking the string AFTER decoding is the only truly safe
way to operate.

Note that I can't think of a specific way to break this off the top of
my head.

What kind of performance benefit are you looking at, here? Certainly,
every bit helps, but is the wire-to-servlet pipeline really that long
at this point?

- -chris
Comment: Using GnuPG with Thunderbird -


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message