hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Laura Werner <la...@lwerner.org>
Subject Re: Encoding of special characters in request URI
Date Thu, 10 Jul 2003 19:35:28 GMT
Oleg Kalnichevski wrote:

>This is one of many 'shady' areas of the HTTP spec. Basically there is
>no standard way for the client to communicate to the server what coding
>has been used to decode query parameters.
>
It's definitely shady.  I've seen two approaches used here.  In the 
past, many internationalized applications would assume that the 
non-ASCII encoded characters in submitted URIs were in the same 
character set as the page that was submitting the request.  So if you 
know that you generated foo.jsp in Latin-5, then you assume that any 
URIs requests coming from foo.jsp should be treated as Latin-5 after 
being URL-decoded.  There's a paper on this technique floating around 
somewhere, written by a guy I used to work with at IBM, but I can't find 
it on the Web.

The more modern approach is to assume that the URI is always in UTF-8.  
If there are any non-ASCII characters in it after URL-decoding, then you 
run it through a UTF-8 converter (UTF-8 to UTF-16 in the case of Java).  
Here's a proposal on this:  
http://www.w3.org/International/O-URL-and-ident.html.  If you follow the 
links from there you'll find other useful pages such as 
http://www.w3.org/International/questions/qa-forms-utf-8.html.

-- Laura


Mime
View raw message