tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Kolinko <>
Subject Re: Semicolon URI encoding and RFC
Date Mon, 09 May 2011 16:37:22 GMT
2011/5/9 Mindaugas Žakšauskas <>:
> On Mon, May 9, 2011 at 2:03 PM, Konstantin Kolinko
> <> wrote:
> <..>
>> If ";" is part of the actual path, it must be escaped.
>> If ";" starts a "path parameter" it must be unescaped. One well-known
>> example is ";jsessionid" path parameter.
> Thanks for your answer. Is this rule is just "de facto" rule, or is it
> documented anywhere in RFC3986/RFC2396?

As you wrote, it is RFC 3986, per [1]

> Extending my question, is there a clear criteria which would define
> which characters always need escaping and which don't? At the moment I
> am escaping everything that is not unreserved [1], but I am not sure
> about SEOability and user-friendliness - this especially concerns path
> with international characters in URLs, e.g. http://site/pathąčęė

That is up to the browser how to show those URLs. Many browsers have a
setting how to display such URLs.  E.g. try to browse non-English
Wikipedia for an example of i18n addresses.

> I have also found a similar Tomcat bug [2], but it is addressing
> slightly different issue.

[2] is not a bug. It is an invalid report. It is a useful reading, though.

> If anyone wants to benefit this, I have just added 50 bonus points to
> my SO question [3]. The main question I want to get answer for is -
> which characters can and which need escaping, both in terms of RFC and
> Tomcat.

> 1. According to RFC 3986, unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
> 2.
> 3.

BTW, take a look at the class and its URI.toString() and
URI.toURL() methods.

Just one example (not 100% related to your case, but one that happens
to converts a File to a proper URL the correct code is to call


because that takes care of % encodings, while the old File.toURL()
method does not.

2011/5/9 André Warnier <>:
> (like a space encoded as a "+", and a "+"
> encoded as %xy),

Andre, one small correction:
It sometimes causes confusion, but encoding of space as '+' works only
in the query part of the URL.
The unambiguous way to encode a space regardless of is position in URL is %20.

Encoding space as '+' is defined by "url encoding" encoding scheme
defined by HTML standard, in the chapter where it describes how HTML
forms are submitted.

Best regards,
Konstantin Kolinko

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message