hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Becke <be...@u.washington.edu>
Subject Re: Encoding of special characters in request URI
Date Fri, 11 Jul 2003 01:19:02 GMT
That's quite a handy reference.  Thank you for the info Laura.

Mike

On Thursday, July 10, 2003, at 03:35 PM, Laura Werner wrote:

> Oleg Kalnichevski wrote:
>
>> This is one of many 'shady' areas of the HTTP spec. Basically there is
>> no standard way for the client to communicate to the server what 
>> coding
>> has been used to decode query parameters.
>>
> It's definitely shady.  I've seen two approaches used here.  In the 
> past, many internationalized applications would assume that the 
> non-ASCII encoded characters in submitted URIs were in the same 
> character set as the page that was submitting the request.  So if you 
> know that you generated foo.jsp in Latin-5, then you assume that any 
> URIs requests coming from foo.jsp should be treated as Latin-5 after 
> being URL-decoded.  There's a paper on this technique floating around 
> somewhere, written by a guy I used to work with at IBM, but I can't 
> find it on the Web.
>
> The more modern approach is to assume that the URI is always in UTF-8. 
>  If there are any non-ASCII characters in it after URL-decoding, then 
> you run it through a UTF-8 converter (UTF-8 to UTF-16 in the case of 
> Java).  Here's a proposal on this:  
> http://www.w3.org/International/O-URL-and-ident.html.  If you follow 
> the links from there you'll find other useful pages such as 
> http://www.w3.org/International/questions/qa-forms-utf-8.html.
>
> -- Laura
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: 
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: 
> commons-httpclient-dev-help@jakarta.apache.org
>


Mime
View raw message