hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Reschke <julian.resc...@gmx.de>
Subject Re: Problem parsing non-ASCII in query component
Date Tue, 27 Dec 2016 16:51:24 GMT
On 2016-12-27 17:19, Oleg Kalnichevski wrote:
> On Tue, 2016-12-27 at 10:42 -0500, Jaime Hablutzel Egoavil wrote:
>> From RFC 3986:
>>>
>>>
>>> When a new URI scheme defines a component that represents textual
>>> data consisting of characters from the Universal Character Set [UCS],
>>> the data should first be encoded as octets according to the UTF-8
>>> character encoding [STD63]; then only those octets that do not
>>> correspond to characters in the unreserved set *should be* percentencoded.
>>> For example, the character A would be represented as "A",
>>> the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
>>> as "%C3%80", and the character KATAKANA LETTER A would be represented
>>> as "%E3%82%A2".
>>
>>
>> As you can see it says "should" so it seems to me that it is not an
>> obligation to percent encode non-ASCII.
>>
>> A real example where it this problem arises is with Firefox invoking custom
>> URI handlers, for example, if you have something like this in an HTML page:
>>
>> <a href="myuri:?foo=b*%C3%A1*r">Invoke myuri handler</a>
>>
>> The URI handler application will receive
>>
>> myuri:?foo=bár
>>
>> Then, during query component parsing HttpClient will fail to parse that
>> parameter value.
>>
>
> Both HTTP/1.1 and HTTP/2 require message head elements including the
> request URI to be ASCII only.
>
> Oleg

The same is true for URIs in general, as can easily be derived from the 
ABNF in RFC 3986.

Best regards, Julian


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Mime
View raw message