tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Thomas <ma...@apache.org>
Subject URIs, %nn decoding and error handling
Date Fri, 01 Mar 2013 20:36:07 GMT
Due to bug 54602 [1] I have been writing some test cases to examine how 
we handle invalid bytes sequences in URIs.

My expectation was:
- valid byte sequence for expected encoding -> 200 (assuming no other
   problems)
- invalid byte sequence for expected encoding -> 400
- partial byte sequence for expected encoding -> 400

However, that isn't what happens and I currently believe that this 
should happen. The purpose of this e-mail is, therefore, to get 
agreement on what should happen.

There are multiple moving parts here so forgive me if this e-mail gets a 
little long. There are multiple decisions and I expect some to be less 
contentious than others.

These issues were observed with UTF-8. Other encodings may have similar 
issues. May aim is to get a consistent approach regardless of encoding.

Issue 1: URI ends with partial byte sequence
Currently the partial byte sequences are ignored. I think the 
B2CConverter should throw an Exception if the full input (i.e. when 
endOfInput == true) ends in with a partial byte sequence

Issue 2: URI ends with invalid byte sequence
This appears to be a bug in the UTF-8 decoder provided by the JVM. [1] 
has provided one set of input bytes that triggers this. Currently the 
invalid data is ignored. I think that B2CConverter should throw an 
Exception as soon as it can determine that input is invalid. This would 
require:
- switching to the Harmony based UTF-8 decoder used by WebSocket
- further testing of the JRE and Harmony UTF-8 decoders to check for 
other potential issues

Issue 3: Fall back to 'ASCII'
If the conversion fails (i.e. throws an exception for any reason) [2], 
the CoyoteAdapter attempts to decode the provided URI using 'ASCII' 
rather than the configured connector encoding. I say 'ASCII" because the 
comments say ASCII but it is actually ISO-8859-1.
I don't believe it appropriate to fall back to anything here. The fall 
back code has been present since conversion support was added but I 
can't think of any scenario where this stands any chance of working 
reliably. I would like to remove this fall back code.

I would like to make these changes in trunk and 7.0.x.

I expect to have a similar discussion about request bodies once URIs are 
resolved where I have essentially the same view - a decoding error 
should lead to a request failure.

Thoughts?

Mark


[1] https://issues.apache.org/bugzilla/show_bug.cgi?id=54602
[2] 
http://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/catalina/connector/CoyoteAdapter.java?view=annotate

(line ~1054)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message