hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ortwin Glück (JIRA) <j...@apache.org>
Subject [jira] Resolved: (HTTPCLIENT-642) browser encoded UTF-8 character gets truncated by URI upon escaping
Date Sat, 10 Mar 2007 18:30:09 GMT

     [ https://issues.apache.org/jira/browse/HTTPCLIENT-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ortwin Glück resolved HTTPCLIENT-642.

    Resolution: Invalid


You have a form that accepts ISO-8859-1. But then your user enters a chinese character which
CAN NOT be represented in that encoding. That's why your browser resorts to representation
as an HTML entity: "&#33021;" which it additionally HTML-encodes to "&amp;#33021;".
I know of no standard that describes this behviour. It looks completely arbitrary. As a matter
of fact, entering of non-ISO characters in such a form is not allowed and the result is not
well-defined. If you need chinese characters use a UTF-8 capable form.

Now, in your example the url variable contains a # character which is NOT used as a separater
for the reference (anchor) part of the URL. The # character is a reserved URI character for
that purpose. So it must  be escaped when used inside a GET parameter value like in this example.
This means that this URL is not properly escaped. The URI class' behaviour is correct.


> browser encoded UTF-8 character gets truncated by URI upon escaping
> -------------------------------------------------------------------
>                 Key: HTTPCLIENT-642
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-642
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>    Affects Versions: 3.0.1
>            Reporter: Ralf Hauser
> a mozilla get request of an iso-8859-1 form where a user inadvertently entered
a chinese character arrives at my tomcat like 
> String url=  "/hp/index.php?address=addr&email=hauser@acm.org&name=Ralf&amp;#33021;
> the chinese charcter  能 being encoded as &amp;#33021;
>                                 URI uri = new URI(url, false, "ISO-8859-1");
> 				GetMethod httpGet = new GetMethod(uri.getEscapedURI());
> 				log.debug(httpGet.getURI());
>   "/hp/index.php?address=addr&email=hauser@acm.org&name=Ralf&amp;"
> How should I deal with that until the v4 is out? Will that no longer happen there?
> see also HTTPCLIENT-577

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org

View raw message