httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henrik Nordstrom <...@squid-cache.org>
Subject Re: Wrong etag sent with mod_deflate
Date Sat, 09 Dec 2006 22:15:44 GMT
lör 2006-12-09 klockan 05:44 -0500 skrev TOKILEY@aol.com:

> It's relevant to the extent that I think there are still some things
> missing from the RFCs with regards to all this which is why a piece
> of software like SQUID might be "doing the wrong thing" as well.

Ater reading the RFC on this topic many many times I can not agree that
it's that incomplete.

The scheme set by the RFC is quite complete as long as you stay with
strong ETags, allowing for cache correctness, update serialization and
many good things.

Situations requiring weak etags also works out pretty well for cache
correctness thanks to If-None-Match, but not other operations as they
are banned from both non-GET/HEAD requests and If-Match conditions.
  
> ...and, currently, if the cache has stored both a compressed and
> and non-compressed version of the same entity received from Apache
> ( sic: mod_deflate ) then the same ( strong ) ETag is returned
> in the conditional GET for both of the cached variants.
>  
> Hmmm... begins to look like a problem... but is it really?... 

It is.

See "13.6 Caching Negotiated Responses" (all of it). And then skim over
"14.26 If-None-Match", and finally read "10.3.5 304 Not Modified". Then
piece them together.

Also take note that nowhere is there any requirement on the cache to
evaluate any server driven content negotiation inputs (Accept-XXX etc).
This responsibility is fully at the origin server and reflected back via
ETag.

Caches evaluate Vary in finding the correct response entity.

> > If the server says that any one of the representations,
> > as indicated by the ETag in a 304 response, is okay, 
>  
> "okay" means "fresh".

Not only that, it also tells which entity among the N cached ones is
valid to send as response to this request.

> happen to share the same (strong) ETag... if SQUID is delivering
> stale compressed variants when a 304 response says that the
> original "identity" variant is "not fresh" then that's just
> a colossal screw-up in the caching code itself.

The 304 says

Send the entity with the ETag "XXX", its still fresh. Nothing more. If
does not indicate if this is a identiy of gzip encoded, neither the
content length, content type or anything other relevant to the actual
content besides the ETag and/or Content-Location.
 
> Regardless of what the server says... how could you ever get
> into a situation where you would consider a compressed variant
> of an entity "fresh" when the "identity" version is now "stale"? 

As HTTP did not consider dynamic content encoding it sees the two
entities as different objects (i.e. file and file.gz) and does not
enforce a strict synchronization between the two. The only requirement
set in the RFC is that the origin server SHOULD make sure the two
representations on the server is in synch.

> is seriously confused even if the ETags are the same and the
> cache is sending back "stale" compressed variants when the
> "identity" variant ( strong ETag value ) is also "stale". 

I don't know what condition you refer to here. the Squid cache (2.6)
only remembers the last seen of the two as the later response with the
same ETag overwrites the first..

> There's still something missing from the specs or something.

Not that I can tell.
 
> When an exact, literal interpretation of a spec tends to 
> defy common sense... my instinct is to suspect the spec itself.

In what way? There is something in your reasoning I don't get.
  
> DCE ( Dynamic Content Encoding ) is a valid concept even if it
> wasn't sufficiently "imagined" at the time the specs were
> codified. It works. It works WELL... and it is something that
> OUGHT to always be possible if the RFCs mean anything at all.

And it is possible. Just that you need to pay attention to

  Content-Location
  ETag
  Content-MD5

as all of these is affected by dynamically altering the entity by server
driven content negotiation with static or dynamic recoding of the
entity.

> One of the main "prime directives" for developing Apache 2.0
> at all was to finally re-org the IO stream so that schemes
> like DCE could be done more easily than were already being
> done in the 1.3.x framework. Mission was accomplished.
> Filtering was born. It would be a shame to consider abandoning
> one of the very concepts that gave birth to Apache 2.0 for 
> the sake of a few more lines of code that could take it
> into the "end zone".

Agreed.
 
> No argument here. Transfer-encoding is about a DECADE overdue now.

And as already indicated should be piece of cake to add to mod_deflate,
and as HTTP support evolves in clients and caches is likely to lessen
the complexity of dealing with mod_deflate and conditionals
considerably.
 
> In the case of compressed entities it would still be a good idea
> to always add a standard header which indicates the original
> uncompressed content-length ( if it's possible to know it ).

There is no such header in HTTP, but you are free to propose one. But
it's worth noting that this information also exists in the gzip
encoding.

Current specs does not handle Content-Encoding very different from
Content-Language. Perpahs is should but thats a different discussion. 

Personally my view on what I think you are talking about here (how to
detect updates while an earlier response is fresh) is that perhaps the
invalidation mechanisms needs to be extended a bit beyond the simple
"URI" and "Content-Location" scheme existing today where some
methods/responses invalidate full URIs, some only the entitiy of the URI
with a specific Content-Location.
 
> If Transfer-encoding ever becomes a reality you will see the need
> for DCE decrease. It is actually the CACHES themselves that need
> TE capability more than the Server/Cache sub-links. 

Agreed, and this is because it's only shared caches which are likely to
see requests with different capabilities for the same URI.

> More often than not... it is the CACHES that are handling the
> "last mile", which is where compression makes the biggest difference.

It's also important to sites paying for the used bandwidth. Have
customers using compression not because it speeds up access to the site
but because it reduces their bandwidth usage on multi-gigabit links.

Regards
Henrik

Mime
View raw message