tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: Semicolon URI encoding and RFC
Date Mon, 09 May 2011 17:53:02 GMT
Konstantin Kolinko wrote:
> 2011/5/9 André Warnier <>:
>> (like a space encoded as a "+", and a "+"
>> encoded as %xy),
> Andre, one small correction:
> It sometimes causes confusion, but encoding of space as '+' works only
> in the query part of the URL.
> The unambiguous way to encode a space regardless of is position in URL is %20.
> Encoding space as '+' is defined by "url encoding" encoding scheme
> defined by HTML standard, in the chapter where it describes how HTML
> forms are submitted.
Agreed, my mistake.
Also, in the query string part, an unencoded ";" could be taken as a query parameter 
separator, no ?  (an alternative to "&").
But I forget what RFC that is, if any.

Now one additional comment. You said :
 > about SEOability and user-friendliness - this especially concerns path
 > > with international characters in URLs, e.g. http://site/pathąčęė

That is up to the browser how to show those URLs. Many browsers have a
setting how to display such URLs.  E.g. try to browse non-English
Wikipedia for an example of i18n addresses.

I think that the above is a bit confusing.
The "site" (or hostname) part of the URL is submitted to a different encoding than the 
path part (/pathąčęė).  The path part must be URL-encoded, but for the hostname part,
is used is "punycode", see
Just another example of the current mess with character sets and encodings...

I guess one has to have a first or last name containing so-called "diacritic" characters 
to really appreciate these issues.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message