tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: Semicolon URI encoding and RFC
Date Mon, 09 May 2011 17:53:02 GMT
Konstantin Kolinko wrote:
..
> 
> 2011/5/9 André Warnier <aw@ice-sa.com>:
>> (like a space encoded as a "+", and a "+"
>> encoded as %xy),
> 
> Andre, one small correction:
> It sometimes causes confusion, but encoding of space as '+' works only
> in the query part of the URL.
> The unambiguous way to encode a space regardless of is position in URL is %20.
> 
> Encoding space as '+' is defined by "url encoding" encoding scheme
> defined by HTML standard, in the chapter where it describes how HTML
> forms are submitted.
> 
Agreed, my mistake.
Also, in the query string part, an unencoded ";" could be taken as a query parameter 
separator, no ?  (an alternative to "&").
But I forget what RFC that is, if any.

Now one additional comment. You said :
..
 > about SEOability and user-friendliness - this especially concerns path
 > > with international characters in URLs, e.g. http://site/pathąčęė

That is up to the browser how to show those URLs. Many browsers have a
setting how to display such URLs.  E.g. try to browse non-English
Wikipedia for an example of i18n addresses.
..

I think that the above is a bit confusing.
The "site" (or hostname) part of the URL is submitted to a different encoding than the 
path part (/pathąčęė).  The path part must be URL-encoded, but for the hostname part,
what 
is used is "punycode", see http://en.wikipedia.org/wiki/Punycode.
Just another example of the current mess with character sets and encodings...

I guess one has to have a first or last name containing so-called "diacritic" characters 
to really appreciate these issues.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message