hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: Problem parsing non-ASCII in query component
Date Tue, 27 Dec 2016 16:19:59 GMT
On Tue, 2016-12-27 at 10:42 -0500, Jaime Hablutzel Egoavil wrote:
> From RFC 3986:
> >
> >
> > When a new URI scheme defines a component that represents textual
> > data consisting of characters from the Universal Character Set [UCS],
> > the data should first be encoded as octets according to the UTF-8
> > character encoding [STD63]; then only those octets that do not
> > correspond to characters in the unreserved set *should be* percentencoded.
> > For example, the character A would be represented as "A",
> > the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
> > as "%C3%80", and the character KATAKANA LETTER A would be represented
> > as "%E3%82%A2".
> 
> 
> As you can see it says "should" so it seems to me that it is not an
> obligation to percent encode non-ASCII.
> 
> A real example where it this problem arises is with Firefox invoking custom
> URI handlers, for example, if you have something like this in an HTML page:
> 
> <a href="myuri:?foo=b*%C3%A1*r">Invoke myuri handler</a>
> 
> The URI handler application will receive
> 
> myuri:?foo=bár
> 
> Then, during query component parsing HttpClient will fail to parse that
> parameter value.
> 

Both HTTP/1.1 and HTTP/2 require message head elements including the
request URI to be ASCII only.

Oleg

> 
> 
> 
> 
> On Tue, Dec 27, 2016 at 10:11 AM, Oleg Kalnichevski <olegk@apache.org>
> wrote:
> 
> > On Sat, 2016-12-24 at 18:26 -0500, Jaime Hablutzel Egoavil wrote:
> > > Currently something like this:
> > >
> > > public class ProblemWithNonAscii {
> > >     public static void main(String[] args) {
> > >         List<NameValuePair> pairs = URLEncodedUtils.parse("foo=bár",
> > > StandardCharsets.UTF_8);
> > >         System.out.println(pairs);
> > >     }
> > > }
> > >
> > > produces this output:
> > >
> > > [foo=b�r]
> > >
> > > Where the 'á' character has been scrambled.
> > >
> > > I can see that this is related to the following narrowing primitive
> > > conversion,
> > > https://github.com/apache/httpclient/blob/4.5.2/
> > httpclient/src/main/java/org/apache/http/client/utils/
> > URLEncodedUtils.java#L570
> > > .
> > >
> > > Is this a bug isn't it?.
> > >
> >
> > Jaime,
> >
> > URL encoded content is not supposed to have non-ASCII characters in the
> > first place, is it not?
> >
> > Oleg
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> > For additional commands, e-mail: dev-help@hc.apache.org
> >
> >
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Mime
View raw message