httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roy T. Fielding" <field...@gbiv.com>
Subject Re: Wrong etag sent with mod_deflate
Date Sun, 10 Dec 2006 06:51:20 GMT
On Dec 9, 2006, at 6:23 AM, Justin Erenkrantz wrote:
> On 12/9/06, Ruediger Pluem <rpluem@apache.org> wrote:
>> Would the following patch address all your points for a CE  
>> mod_deflate filter?
>
> No - this patch breaks conditional GETs which is what I'm against.

Right, the hard part is fixing the effect on ap_meet_conditions, but
it is nice to have an example of the easy part.  Thanks Ruediger.

> See the problem here is that you have to teach ap_meets_conditions()
> about this.  An ETag of "1234-gzip" needs to also satisfy a
> conditional request when the ETag when ap_meets_conditions() is run is
> "1234".  In other words, ap_meets_conditions() also needs to strip
> "-gzip" if it is present before it does the ETag comparison.  But, the
> issue is that there is no real way for us to implement this without a
> butt-ugly hack.

That's not true. The content generator and filters need to be  
constructed
for a 304 response in just the way that they would be for a 200  
response,
and then they must produce the metadata for that response before
ap_meets_conditions() is even called.  That is the only valid way to
produce a 304 or HEAD response.

ap_meets_conditions is supposed to evaluate the metadata produced and
make its decision just prior to applying the method (sending the body).
It doesn't need to know anything about the etag structure -- it just
needs the same etag that would have been generated for a 200 response.

> However, I disagree with Roy in that we most certainly *do* treat the
> ETag values as opaque - Subversion has its own ETag values - Roy's
> position only works if you assume the core is assigning the ETag value
> which has a set format - not a third-party module.  IMO, any valid
> solution that we deploy must work *independently* of what any module
> may set ETag to.  It is perfectly valid for a 3rd-party module to
> include "-gzip" at the end of their ETag.  For example, if you had a
> file called "foo-gzip" in revision 10, SVN would assign the ETag
> "10//foo-gzip".  (And, I could construct a conflict where httpd would
> hork the ETag incorrectly for any arbitrary value.)  -- justin

No, you assume a broken implementation of ap_meets_conditions.
If the function is implemented correctly, it still treats each etag
as opaque.  Given an etag of "X", the deflate variant will be "X-gzip".
If the content generator generates "10//foo-gzip", then the deflate
filtered variant will use "10//foo-gzip-gzip".  The result is correct
and no butt-ugly hacks are needed.  The same can be done for any
content-changing filter, and the result will work as specified as
long as the ordering of filters is consistent (and even if it isn't,
the result remains correct with a slight inefficiency).

The hard part is to fix ap_meet_conditions and/or the metadata
generating aspect of 2.x filters, which may not even be possible
without an API rewrite.  I am not asking anyone to accept partial
solutions -- it is simply a bug that needs to be fixed, and I see no
reason to assume that we can't fix it right.  Just give me some time
to get trunk working on OS X again and remember how to code in C.

The one problem that might come up is if subversion is incorrectly
using mod_deflate as a makeshift content-encoding replacement for
transfer-encoding *and* assuming that the entity-tag will remain the
same for both variants when it makes webdav requests.  In that case,
subversion needs transfer encoding now, not later.

There is absolutely nothing new here.  Dynamic content encodings
were discussed back in 1993 when the Content-Encoding field was
(mistakenly) introduced, and transfer encoding was specifically
intended to avoid these problems.  IIRC, I had the same discussion
with Jeff Mogul over ten years ago and the algorithm that he put
in the spec does work if and only if it is implemented completely.
The reason it is specified in this way is because there are an
infinite number of ways that request header field values could vary
and yet (almost always) a very small number of active representations.
Therefore, an efficient cache should rely on the origin server's
instructions rather than assume that different selecting headers
implies a different representation will be selected.  The standard
was crafted to make caching as efficient as possible.  I just wish
it was easier to read.

When it comes to HTTP, Apache leads -- it does not follow blindly,
nor does it wait for spineless browser developers to introduce a
feature first.  We implement it and that act alone is sufficient
to break the chicken-and-egg dilemma that everyone else uses to
justify their lack of priorities.

....Roy

Mime
View raw message