httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henrik Nordstrom <...@squid-cache.org>
Subject Re: Wrong etag sent with mod_deflate
Date Fri, 08 Dec 2006 21:28:54 GMT
fre 2006-12-08 klockan 15:03 -0500 skrev TOKILEY@aol.com:

> To ONLY ever use ETag as a the end-all-be-all for variant 
> identification is, itself, a mistake.

Well, this area of the  HTTP specs is pretty clear in my eyes, but then
I have read it up and down too many times unwinding the tangled web
which is found in there.

An entity (including encoding) is identified by request URI +
Content-Location.

A specific version of a "entity" is identified by it's unique ETag.

Vary: tells which headers the server used in server driven negotiation
of which entity to respond with. Accept-Encoding is one input to this.

A strong ETag must be unique among all variants of a given URI, that is
all different forms of entities that may reside under the URI and all
their past and future versions.

A weak ETag may be shared by two variants/versions if and only if they
can be considered semantically equivalent and mutually exchangeable at
the HTTP level with no semantic loss. For example different levels of
compression, or minor changes of negligible or no importance to the
semantics of the resource (hit counter example in the specs).
 
> Both pieces of software ( SQUID and Apache ) need just a 
> little more code to finally "get it right".

It's correct that the current Squid implementation is not flawless. Most
notably it has very poor handling of cache invalidations at the moment.
 
> Don't forget about "Content-Length", either. 
> If 2 different responses for the same requested entity come
> back with 2 different Content-Lengths and there is no "Vary:"
> or "ETag" then regardless of any other protocol semantics the 
> only SANE thing for any caching software to do is to recoginze 
> that, assume it is not a mistake, and REPLACE the existing 
> entity with the new one.

Caches tend to by nature replace what they have with what they get.

> Yea.. sure... you might get a lot of cache bounce that way but
> at least you are returning a fresh copy.

How would Content-Length changes cause cache bouncing?

> It is not possible for 2 EXACTLY identical reprsentations of the
> same requested entity to have different content lengths.
> If the lengths are different, then SOMETHING is different with
> regards to what you have in your cache.

Yes, but when would this be seen?

We only get the ETag from Apache, not the Content-Length. Specs forbids
Apache from sending the Content-Length or other entity headers in 304
responses partly to make sure entities do not get corrupted by errors in
the origin server side implementation of server driven content
negotiation.

> No protocol ( sic: set of rules ) can ever cover all the realities.
> ( Good ) software knows how to make "common sense"
> as well.

Indeed and is why we are going slow on implementing the more advanced
features of the specs. But violating MUST level protocol requirements is
not "common sense". And if you actually follow the specs these parts do
make great sense once you get the picture that ETags MUST be unique for
all entity versions of a given URI. The only poor part I have seen in
this area of the specs is that the If-None-Match condition is perhaps a
bit blunt only telling the end results, the ETag of the valid response
entity of a negotiated resource, not how the server came to that
conclusion. This adds a bit more roundtrips to the origin than would be
required only to figure out that "Content-Language: en" is ok both for
"Accept-Language: en" and "Accept-Language: en, sv", but thats about it.
(yes, I intentioanlly avoided Accept-Encoding here to illustrate the
point, the mechanism is the exact same however).

RFC 2616 3.11 Entity Tags

   A "strong entity tag" MAY be shared by two entities of a resource
   only if they are equivalent by octet equality.

   An entity tag MUST be unique across all versions of all entities
   associated with a particular resource. A given entity tag value MAY


See also 14.26 If-None-Match, and numerous other references to ETag.

I can bombard you with long chains of supporting claims from the RFC if
you like depending on which parts of the equation you feel is loosely
connected. Just tell me which part you don't trust and I'll happily help
you see the light.

a) That identity and gzip content-encoding of the same resource
represents different entities of the same resource

b) That different entities of the same resource MUST have different
(strong) ETags.

c) That gzip and identity encoding is not semantically equivalent.

d) That the weak ETag W/"X" is semantically equivalent to the strong
ETag "X" with the same quoted value.

Regards
Henrik

Mime
View raw message