tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Brennecke <tb_tom...@headissue.com>
Subject + character is always encoded in url path after request dispatch since #59317
Date Fri, 21 Apr 2017 16:27:41 GMT
Dear list,

in https://bz.apache.org/bugzilla/show_bug.cgi?id=59317
HttpServletRequest.getRequestURI() has been changed for Tomcat 7.0.70
onwards to always return an encoded URI, which matches the servlet 3.0
specification. However the encoding for the path component of the url
seems to be incorrect, so I wanted to raise the issue on the mailing
list first before opening a bug ticket. I could not find any other
related ticket on the bug tracker or any newer discussion in the mailing
list archives since the problem was fixed.

My apologies if this mail is a bit lengthy, but please bear with me as I
want to provide a thorough problem description.

The dispatchersUseEncodedPaths context attribute has been introduced in
Ticket #59317 to revert to the "old" behavior. Still this is broken, as
it seems to encode + characters in dispatched URIs no matter if setting
the value to "true" or "false". (That is, the + is not kept literally.)
Please note that "+" is a perfectly valid character in the path
component of an URL and has no special meaning (e.g. as a space as for a
query string like ?foo=bar+baz). For instance it is used by Google as
literal character https://plus.google.com/+Google while
https://plus.google.com/%20Google returns a 404.
I will return to the details of whether + is a valid character in an url
path further below.

= Problem statement =
We are using + characters in URLs like
https://www.example.com/myservlet/url+with+spaces/sub+url.html which is
handled by a HttpServlet.
For each of these URL's there is also a prefixed version for partners, e.g.
https://example.com/prefix/myservlet/url+with+spaces/sub+url.html

Now if such a prefix is encountered, it gets removed by a servlet Filter
and the request is dispatched to the URL without the prefix, e.g.
/prefix/myservlet/url+with+spaces/sub+url.html is dispatched to
/myservlet/url+with+spaces/sub+url.html, which in turn is handled by the
HttpServlet.
(That is in a Filter:
request.getRequestDispatcher("/myservlet/url+with+spaces/sub+url.html").forward(request,
response);)
Now when calling HttpServletRequest.getRequestURIin the Servlet, the
return values are as follows:

For Tomcat <= 7.0.69:
Calling the url directly: /myservlet/url+with+spaces/sub+url.html
Calling the url with a prefix: /myservlet/url+with+spaces/sub+url.html

Since 7.0.70 the return value of request.getRequestUri() from the
Servlet is very inconsistent:
Calling the URL directly: /myservlet/url+with+spaces/sub+url.html

Now depending on the value of dispatchersUseEncodedPaths:
Calling the prefixed URL and "false" (Note the %2B instead of +):
/myservlet/url%2Bwith%2Bspaces/sub%2Burl.html
Calling the prefixed URL and "true" (+ is replaced by %20):
/myservlet/url%20with%20spaces/sub%20url.html

In any case, this does not match the value as if the url was called
directly and worse the default behavior is not equivalent to the
original url.
The expected behavior here is that instead of encoding the "+" for
"false" or replacing it by a space, it should not be encoded at all.

The reason is that in the catalina URLEncoder.DEFAULT at
https://github.com/apache/tomcat/blob/trunk/java/org/apache/catalina/util/URLEncoder.java
"+" is not in the list of safe characters.
As URLEncoder.DEFAULT is used in all places of the changeset for the bug
ticket  #59317 from the beginning of this mail, "+" characters will
always be encoded. See
https://github.com/apache/tomcat/commit/eb195bebac8239b994fa921aeedb136a93e4ccaf#diff-8b91a9296e19012bf6be4bdf975fab0d
for details.

= On the validity of "+" in URLs =
An HTTP url typically consists of a protocol, host, path and query. Lets
focus on the last two: For /foo+bar?baz=a+b the path is /foo+bar and the
query baz=a+b.
While in the query string the + character has a special meaning as a
space, this is not the case for the path, i.e. it is just a regular
character. Although the encoding of path and query string are somewhat
similar, they are NOT the same!
The query is specified as application/www-form-urlencoded, but the path
is not.
 
= See also =
Question on stack overflow:
stackoverflow.com/questions/1005676/urls-and-plus-signs
Blog Post listing valid characters in URI components, see section "The
reserved characters are different for each part":
https://web-beta.archive.org/web/20150509184317/http://blog.lunatech.com:80/2009/02/03/what-every-web-developer-must-know-about-url-encoding

According RFCs:
https://tools.ietf.org/html/rfc3986#section-2.2
https://tools.ietf.org/html/rfc3986#section-2.3
Note that the set of reserved characters is different for each scheme
and URI component as also stated in the blog post above.

Definition of the HTTP URI scheme in RFC 7230, section 2.7.1/2.7.3) (p.
17ff):
https://tools.ietf.org/html/rfc7230

To my knowledge there is no place in the above RFCs stating that a +
must be encoded in the path component of an URI or that it has a special
meaning (unlike in query strings).

Follow-up discussion after #59317 was fixed:
http://marc.info/?l=tomcat-user&m=146800805502015

= How do other servlet containers handle this? =
For Jetty I found the following issue:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=435017

= Reproducing the problem =
I created the following Gist (to keep this mail shorter):
https://gist.github.com/tburny/468e635c176752f21251fc641450594d
I ran this with Tomcat 7.0.69 and 7.0.77, but I would assume that all 
versions affected by #59317 are also affected by the behavior I  described.

My question is whether this behavior is intended or if this is a bug.

As I'm a native German speaker, I apologize for any grammar mistakes or
misspellings. Thank you for your efforts and patience while reading this
mail.


Kind regards,

Tobias Brennecke


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message