hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaime Hablutzel Egoavil <hablutz...@gmail.com>
Subject Re: Problem parsing non-ASCII in query component
Date Tue, 27 Dec 2016 15:42:54 GMT
>From RFC 3986:
>
>
> When a new URI scheme defines a component that represents textual
> data consisting of characters from the Universal Character Set [UCS],
> the data should first be encoded as octets according to the UTF-8
> character encoding [STD63]; then only those octets that do not
> correspond to characters in the unreserved set *should be* percentencoded.
> For example, the character A would be represented as "A",
> the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
> as "%C3%80", and the character KATAKANA LETTER A would be represented
> as "%E3%82%A2".


As you can see it says "should" so it seems to me that it is not an
obligation to percent encode non-ASCII.

A real example where it this problem arises is with Firefox invoking custom
URI handlers, for example, if you have something like this in an HTML page:

<a href="myuri:?foo=b*%C3%A1*r">Invoke myuri handler</a>

The URI handler application will receive

myuri:?foo=bár

Then, during query component parsing HttpClient will fail to parse that
parameter value.





On Tue, Dec 27, 2016 at 10:11 AM, Oleg Kalnichevski <olegk@apache.org>
wrote:

> On Sat, 2016-12-24 at 18:26 -0500, Jaime Hablutzel Egoavil wrote:
> > Currently something like this:
> >
> > public class ProblemWithNonAscii {
> >     public static void main(String[] args) {
> >         List<NameValuePair> pairs = URLEncodedUtils.parse("foo=bár",
> > StandardCharsets.UTF_8);
> >         System.out.println(pairs);
> >     }
> > }
> >
> > produces this output:
> >
> > [foo=b�r]
> >
> > Where the 'á' character has been scrambled.
> >
> > I can see that this is related to the following narrowing primitive
> > conversion,
> > https://github.com/apache/httpclient/blob/4.5.2/
> httpclient/src/main/java/org/apache/http/client/utils/
> URLEncodedUtils.java#L570
> > .
> >
> > Is this a bug isn't it?.
> >
>
> Jaime,
>
> URL encoded content is not supposed to have non-ASCII characters in the
> first place, is it not?
>
> Oleg
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
>
>


-- 
Jaime Hablutzel -  RPC 994690880

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message