httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William A Rowe Jr <wr...@rowe-clan.net>
Subject Re: StrictURI in the wild [Was: Backporting HttpProtocolOptions survey]
Date Mon, 12 Sep 2016 21:38:55 GMT
On Mon, Sep 12, 2016 at 3:06 PM, William A Rowe Jr <wrowe@rowe-clan.net>
wrote:

> On Mon, Sep 12, 2016 at 10:49 AM, William A Rowe Jr <wrowe@rowe-clan.net>
> wrote:
>
>> On Mon, Aug 29, 2016 at 1:04 PM, Ruediger Pluem <rpluem@apache.org>
>> wrote:
>>
>>>
>>> On 08/29/2016 06:25 PM, William A Rowe Jr wrote:
>>> > Thanks all for the feedback. Status and follow-up questions inline
>>> >
>>> > On Thu, Aug 25, 2016 at 10:02 PM, William A Rowe Jr <
>>> wrowe@rowe-clan.net <mailto:wrowe@rowe-clan.net>> wrote:
>>> >
>>> >     4. Should the next 2.4/2.2 releases default to Strict[URI] at all?
>>> >
>>> >     Real world direct observation especially appreciated from actual
>>> deployments.
>>> >
>>> > Strict (and StrictURI) remain the default.
>>>
>>> StrictURI as a default only makes sense if we have our own house in
>>> order (see above), otherwise it should be opt in.
>>
>>
>> So it's not only our house [our %3B encoding in httpd isn't a showstopper
>> here]... but also whether widely used user-agent browsers and tooling
>> have
>> their houses in order, so I started to study the current browser
>> behaviors.
>> The applicable spec is https://tools.ietf.org/html/rfc3986#section-3.3
>>
>
> The character '|' is also invalid. However, Firefox fails to follow the
>> spec
>> again here (although Chrome gets it right).
>>
>> With respect to these characters, recall this 18 year old document,
>> last paragraph describes the rational;
>> https://tools.ietf.org/html/rfc2396.html#section-2.4.3
>>
>>    unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
>>
>>    Data corresponding to excluded characters must be escaped in order to
>>    be properly represented within a URI.
>>
>>
>> While it was labeled 'unsafe', 'unwise', and now disallowed-by-omission
>> from RFC3986, the 'must' designation couldn't have been any clearer.
>> We've had this right for 2 decades at httpd.
>>
>> Second paragraph of https://tools.ietf.org/html/rfc3986#appendix-D.1
>> goes into some detail about this change, and while it is hard to parse,
>> the paragraph is stating that '[' ']' were once invalid, now are reserved,
>> and remain disallowed in all other path segments and use cases.
>>
>> The upshot, right now StrictURI will accept '[' and ']', but this won't survive
>> a rewrite of the apr parser operating with a 'strict' toggle. StrictURI does
>> not accept '|'. The remaining question is what to do, if anything, about
>> carving a specific exception here due to modern Firefox issues.
>>
>> Thoughts/Comments/Additional test data?  TIA!
>>
>>
It really seems that if a major client is not handling "|" correctly, we
need to
carve out an exception, as well as disallow the "#" fragment gen-delim which
is not allowed to be presented. e.g.;

--- server/gen_test_char.c (revision 1760444)
+++ server/gen_test_char.c (working copy)
@@ -143,10 +143,11 @@
          * and unreserved (2.3) that are possible somewhere within a URI.
          * Spec requires all others to be %XX encoded, including obs-text.
          */
-        if (c && (strchr("%"                              /* pct-encode */
-                         ":/?#[]@"                        /* gen-delims */
-                         "!$&'()*+,;="                    /* sub-delims */
-                         "-._~", c) || apr_isalnum(c))) { /* unreserved */
+        if (c && (strchr("%"                           /* pct-encode */
+                         ":/?[]@"                      /* gen-delims - "#"
*/
+                         "!$&'()*+,;="                 /* sub-delims */
+                         "-._~"                        /* unreserved */
+                         "|", c) || apr_isalnum(c))) { /* permit firefox
bug */
             flags |= T_URI_RFC3986;
         }


so my only remaining question is what of the others in the not-mentioned,
entirely invalid set? <"> | "<" | ">" | "\" | "^" | "`" | "{" | "}" ... so
far the modern
browsers reviewed handle these correctly, but if anyone has old browsers
still
installed for testing/validation, double checking the test queries would be
a big
help still, as well as confirming on Safari, Dolphin, etc.

Are we ok with adding one invalid exception for firefox to StrictURI (and
later,
two more "[" "]" when we code segment-by-segment validation into apr) while
still disallowing the rest of this list?

Mime
View raw message