hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yonik <ysee...@gmail.com>
Subject Re: application/x-www-form-urlencoded standard
Date Sat, 22 Dec 2007 15:20:27 GMT


Roland Weber wrote:
> 
> Hello Yonik,
> 
>> The standard that specifies URI encoding is
>> http://www.ietf.org/rfc/rfc3986.txt
>> Does anyone have a pointer that specifies that
>> application/x-www-form-urlencoded should be handled in the same manner?
> 
> http://www.w3.org/TR/html401/interact/forms.html#form-content-type
> 

Thanks... that standard doesn't specifically address how to handle unicode
(and references the older http://www.ietf.org/rfc/rfc1738.txt which also
doesn't handle it).
One *could* make the logical leap that since rfc3986 updates rfc1738, that
the double-encoding in section 2.5 of rfc3986 should now apply (encode using
UTF-8 first, then percent encode those individual octets).

However, one could also make the case that that hack only applies to the URI
since there is no place to specify a character encoding.  Since we *can*
specify character sets for a POST body, it could also make sense to simply
leave \u00e9 alone (encode it per the declared charset).

It would be nice if the standards were actually updated to spell it out.

-Yonik

from http://www.w3.org/TR/html401/interact/forms.html#form-content-type
'''
application/x-www-form-urlencoded  

This is the default content type. Forms submitted with this content type
must be encoded as follows:

   1. Control names and values are escaped. Space characters are replaced by
`+', and then reserved characters are escaped as described in [RFC1738],
section 2.2: Non-alphanumeric characters are replaced by `%HH', a percent
sign and two hexadecimal digits representing the ASCII code of the
character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
   2. The control names/values are listed in the order they appear in the
document. The name is separated from the value by `=' and name/value pairs
are separated from each other by `&'.
'''


-- 
View this message in context: http://www.nabble.com/application-x-www-form-urlencoded-standard-tp14464212p14470290.html
Sent from the HttpClient-User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Mime
View raw message