hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jared Jacobs (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HTTPCLIENT-884) Charset omitted from UrlEncodedFormEntity Content-Type header
Date Mon, 30 Aug 2010 17:11:55 GMT

    [ https://issues.apache.org/jira/browse/HTTPCLIENT-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904252#action_12904252
] 

Jared Jacobs commented on HTTPCLIENT-884:
-----------------------------------------

If you are sending simple ASCII content, then you can try specifying "US-ASCII" or "ISO-8859-1"
as the second argument to the UrlEncodedFormEntity constructor.

Oleg, you may want to provide a way in UrlEncodedFormEntity to exclude the "charset" parameter
here. Apparently it's still common for servers not to parse media type parameters correctly.
For backwards compatibility, browser vendors decided not to specify the charset in the Content-Type
header (even though it's arguably most correct and a common practice -- just search the web),
but instead to give authors the option of sending the character set as an extra "_charset_"
parameter in the request's body, and that practice made it into the HTML 5 spec:
http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data

Unlike most media types, which are "owned" by IANA, the browser vendors and W3C own the non-standard
"application/x-www-form-urlencoded".

> Charset omitted from UrlEncodedFormEntity Content-Type header
> -------------------------------------------------------------
>
>                 Key: HTTPCLIENT-884
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-884
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 4.0 Final
>         Environment: all
>            Reporter: Jared Jacobs
>            Priority: Minor
>             Fix For: 4.0.1, 4.1 Alpha1
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> UrlEncodedFormEntity sets the Content-Type header to:
>    "application/x-www-form-urlencoded"
> It should set the header to:
>    "application/x-www-form-urlencoded; charset=" + charset
> As a result, content can be misinterpreted by the recipient (e.g. if the entity content
includes multibyte Unicode characters encoded with the "UTF-8" charset).
> For a correct example of specifying the charset in the Content-Type header, see StringEntity.java.
> Here's the fix:
>     public UrlEncodedFormEntity (
>         final List <? extends NameValuePair> parameters, 
>         final String encoding) throws UnsupportedEncodingException {
>         super(URLEncodedUtils.format(parameters, encoding),  encoding);
> -        setContentType(URLEncodedUtils.CONTENT_TYPE);
> +        setContentType(URLEncodedUtils.CONTENT_TYPE + HTTP.CHARSET_PARAM +
> +            (encoding != null ? encoding : HTTP.DEFAULT_CONTENT_CHARSET));
>     }
>     public UrlEncodedFormEntity (
>         final List <? extends NameValuePair> parameters) throws UnsupportedEncodingException
{
> -        super(URLEncodedUtils.format(parameters, HTTP.DEFAULT_CONTENT_CHARSET), 
> -            HTTP.DEFAULT_CONTENT_CHARSET);
> -        setContentType(URLEncodedUtils.CONTENT_TYPE);
> +        this(parameters, HTTP.DEFAULT_CONTENT_CHARSET);
>     }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Mime
View raw message