httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roy T. Fielding" <field...@gbiv.com>
Subject Re: StrictURI in the wild [Was: Backporting HttpProtocolOptions survey]
Date Wed, 14 Sep 2016 20:42:31 GMT
> On Sep 14, 2016, at 6:28 AM, William A Rowe Jr <wrowe@rowe-clan.net> wrote:
> 
> On Tue, Sep 13, 2016 at 5:07 PM, Jacob Champion <champion.p@gmail.com <mailto:champion.p@gmail.com>>
wrote:
> On 09/13/2016 12:25 PM, Jacob Champion wrote:
> What is this? Is this the newest "there are a bunch of almost-right
> implementations so let's make yet another standard in the hopes that it
> won't make things worse"? Does anyone know the history behind this spec?
> 
> (My goal in asking this question is not to stare and point and laugh, but more to figure
out whether we are skating to where the puck is going. It would be nice for users to know
which specification StrictURI is being strict about.)
> 
> RFC3986 as incorporated by and expanded upon by reference in RFC7230. 
> 
> IP, TCP, HTTP and it's data and framing are defined by the IETF. HTTP's
> definition depends on the meaning of many things, including ASCII, URI
> syntax, etc, see its table of citations. The things it depends on simply
> can't be moving targets any more than those definitions that the TCP 
> protocol is dependant upon. The IETF process is to correct a broken 
> underlying spec with a newly revised spec subject to peer review, and 
> then update the consuming specs to leverage the changes in the 
> underlying, where necessary (in some cases the revised underlying
> spec, once applied, has no impact on the consuming spec.)
> 
> HTML folks use URL's, and therefore forked the spec they percieved as
> too rigid and inflexible. In fact, it wasn't, but it appears so if you read the
> spec as requiring -users- to -type- valid URI's, which was never the case.
> Although it gets prickly if you consider handling badly authored href= links 
> in html. HTML became a "living spec" subject to perpetual evolution;
> this results in a state where all implementations are perpetually broken.
> But the key take-away is that whattfwg URI does not and cannot
> supercede RFC3986 for the purposes of RFC7230. Rather than improve
> the underlying spec, the group decided to overlay an unrelated spec.
> 
> https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/ <https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/>
does one
> decent job explaining some of this. Google "URI whatwg vs. ietf" for
> an excessively long list of references.
> 
> So in short, whatwg spec describes URI's anywhere someone wants
> to apply their defintion; HTML5 is based upon this. The wire protocol 
> of talking to an http: schema server is defined by RFC7230, which 
> subordinates to the RFC3986 definition of a URI. How you choose to 
> apply these two specs depends on your position in the stack.

I don't consider the WHATWG to be a standards organization, nor should
anyone else. It is just a selective group (a clique) with opinions about
software that they didn't write and a desire to document it in a way that
excludes the interests of everyone other than browser developers.

The main distinction between the WHATWG "URL standard" (it isn't)  and
the IETF URI standard (it is, encompassing URL and URN) is that HTML5
needs to define the url object in DOM (what is basically an object containing
a parsed URI reference), whereas the IETF needs to define a grammar for
the set of uniform identifiers believed to be interoperable on the Internet.

Obviously, if one spec wants to define everything a user might input as a
reference and call that "URL", while the other wants to define the interoperable
identifier output after uniform parsing of a reference relative to a base URI
as a "URL", the two specs are not going to be compatible.

Do you think the empty string ("") is a URL?  I don't.

A normal author would have used two different terms to define the two
different things (actually, four different things, since the URL spec also uses
url to describe two other things related to URL processing). The IETF chose a
different term, 23 years ago, when it created the term URL instead of just
defining them as "WWW Addresses" or universal document identifiers.

Instead of making a rational effort to document references in HTML, the
WHATWG decided to go on an ego trip about what "real developers" call
a "URL", and then embarked on yet another political effort to reject IETF
standards (that represent the needs of all Internet users, not just
browser developers) in favor of their own "living standards" that only
reflect a figment of the author's imagination (not implementations).

Yes, a user agent will send invalid characters in a request URI.  That is a bug
in the user agent.  Even if every browser chose to do it, that is still a bug in
the browser (not a bug in the spec). The spec knows that those addresses
are unsafe on the command-line and therefore unable to be properly
handled by many parts of the Internet that are not browsers, whereas
the correctly encoded equivalent is known to be interoperable. Hence,
the real standard requires that they be sent in an interoperable form.

Anyway, we have to be careful when testing to note that what a user agent
does with a reference is often dependent on the context in which it receives
the reference.  The requirements of the URI spec are mostly about generation
of an interoperable URI, rather than making a request containing an arbitrary
URI reference. Hence, some browsers will only encode a URI properly when
they have control over the generation process, leaving the responsibility for
proper encoding of other references to the authors creating those links.
Thus, a user agent might encode the request URI differently if the reference
is received in an href than it would when the same string is typed in the
address dialog, constructed via javascript, or stored within a bookmark.
Likewise, some user agents (like curl and wget) will send invalid characters
in a request URI because they are deliberately chosen for pen testing.

RFCs never limit what a component can send, since conformance is voluntary.
What they limit is the range of chaos that is considered interoperable, with
an expectation that a normal sender will want to conform, for its own sake,
and a normal recipient can feel free to ignore or error on non-conformance.

....Roy


Mime
View raw message