httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henrik Nordstrom <...@squid-cache.org>
Subject Re: Wrong etag sent with mod_deflate
Date Sun, 10 Dec 2006 04:55:34 GMT
lör 2006-12-09 klockan 20:38 -0500 skrev TOKILEY@aol.com:

> If you are referring to Justin quoting ME let me supply a big
> fat MEA CULPA here and say right now that I haven't looked
> at the SQUID Vary/ETag code since the last major release
> and I DO NOT KNOW FOR SURE what SQUID is doing ( or
> not doing ) if/when it sees the same (strong) ETag for both
> a "compressed" and an "identity" version of the same entity.

Thats not the problem. The problem is that Apache tells us that we
should use whatever we got first on all subsequent responses.

The chain of events leading to the problem is as follows:

1. We forward request A. Lets say this claims Accept-Encoding: gzip.

2. Apache mod_deflate returns an gzip:ed entity with ETag
"6bf1f7-6-1b6d6340" and Vary: Accept-Encoding.

3. We get another request with a different Accept-Encoding value. This
gets forwarded to Apache with an If-None-Match header telling the ETags
of the entities we have, i.e. If-None-Match "6bf1f7-6-1b6d6340".

4. The entity hasn't changed and Apache responds with a 304 ETag
"6bf1f7-6-1b6d6340" telling us that the valid response entity for this
request is the previous received response with ETag "6bf1f7-6-1b6d6340",
and any updated HTTP headers for that response.

The problem arises in '4'.

> Period. I DO NOT KNOW FER SURE.

Then stop saying that Squid is broken, does not implement X or broken
clients such as Squid. All I ask. Fine to say that you do not understand
why it is a problem for Squid.

> In my other posts, I was suggesting, however, that even if
> an upstream content server ( Apache ) is not sending separate
> unique ETags I am still having a hard time understanding why
> that would cause SQUID to deliver the wrong "Varied" response
> back to the user.

Simply because Apache explicitly tells it do exactly that in it's 304
response.

> A compressed version of an entity IS the same entity...

Nope. It's a different representation of the the same resource, but not
the same entity in terms of HTTP. This is the key difference between
Content-Encoding and Transfer-Encoding.

Content-Encoding is a property of the entity.

Transfer-Encoding is a property of how the message is sent, just like
chunked, with no implications on the entity.

The problem arises from trying to use Content-Encoding as if it was
Transfer-Encoding.

Many years ago we had the same discussion about Vary, and when dust
settled all understood the problem about not sending correct Vary in the
responses. Now as the cache implementation is evolving we are hitting
the exact same problem again in a different form this time due to ETag
collisions. I am sorry that we did not realize the full extent of the
brokenness of these responses the first time when Vary was discussed.

> for
> all intents and purposes... it just has "compression" 
> applied. One cannot possibly become "stale" without the
> other also being "stale" at the same exact moment in time.

HTTP does not make this strict freshness relation between entities of
the same URI, but thats a different question and generally not a big
problem.

> At the moment... yes... I do... but if you read my other posts I
> also have a feeling the reason I can't quote you Verse and Chapter
> from an RFC is because I have a sneaking suspicion that there
> is something "missing" from the ETag/Vary scheme that can 
> lead to problems like this... and it's NOT IN ANY RFC YET.

And what I am saying is that Apache mod_deflate is violating a MUST
level requirement on ETag in the RFC, thereby making the caching section
of the same RFC break down.

> In other words... you may be doing exactly what hours and hours
> of reading an RFC seems to be telling you you SHOULD do... but
> there still might be something "else" that OUGHT to be done.

And I am telling you that this part of the RFC is complete, save for the
small detail that the server can not signal that both the compressed and
identity encoding becomes stale when one changes, only one at a time.

> There will always be the chance that some upstream server will
> ( mistakenly? ) keep the same (strong) ETag on a compressed
> variant.

True, there will always be non-compliant implementation out there in
various forms, and they will continue causing problems at least for as
long as it's about MUST level violations. In many cases (this one
included) workarounds can be found, but that does not justify the ones
being non-compliant to continue and intentionally being non-compliant
when informed about the problem.

> People are not perfect and they make mistakes. I still
> think that even when that happens any caching software should
> follow the "be lenient in what you accpet and strict in what you
> send" rule and still use the other information available to it

Which in this case is none. The only information we ever get from Apache
is the ETag of the supposedly valid to use response, and possibly new
freshness details about the same.

> ( sic: What the client really asked for and expects ) and 
> "do the right thing". Only the cache knows what the client
> is REALLY asking for.

There is a pretty clear distinction in the RFC on this. Caches obeys
Vary, origin servers Accept-XXX. Yes, it can be argued if that's the
best way of designing a protocol, but it's how it is specified and as
long as at least the MUST level requirements is implemented it works out
reasonably well.

Regards
Henrik

Mime
View raw message