httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roy T. Fielding" <field...@gbiv.com>
Subject Re: Wrong etag sent with mod_deflate
Date Sat, 09 Dec 2006 05:52:43 GMT
On Dec 8, 2006, at 3:35 PM, Justin Erenkrantz wrote:

> On 12/8/06, Roy T. Fielding <fielding@gbiv.com> wrote:
>> What we should be doing is sending transfer-encoding, not content-
>> encoding,
>> and get past the chicken and egg dilemma of that feature in HTTP.
>> If we are changing content-encoding, then we must behave as if there
>> are two different "files" on the server representing the resource.
>> That means tweaking the etag and being prepared to handle that tweak
>> on future conditional requests.
>
> There's just no way to know how to handle any ETag modification on
> future requests.  So, that's a non-starter.  Therefore, any fix for
> this edge case which breaks cacheability in the common case of real
> browsers I would find unacceptable.

It isn't necessary to handle "any" ETag modification -- our ETag
generation is fairly limited and is not opaque to the server.
We only need to avoid conflicts between the content-encoded variant
and the non-encoded variant, which is guaranteed if the encoded
variant has "-gzip" appended to the existing entity-tag.  That will
work fine with the common case of real browsers -- far better than
the current case which will deliver invalid content if a browser
tries to complete a partial download from a cache.

>> In other words, Henrik has it right.  It is our responsibility to
>> assign different etags to different variants because doing otherwise
>> may result in errors on shared caches that use the etag as a variant
>> identifier.
>
> As Kevin mentioned, Squid is only using the ETag and is ignoring the
> Vary header.  That's the crux of the broken behavior on their part.

Then they will still be broken regardless of what we do here.  It simply
isn't a relevant issue.

> If they want to point out minor RFC violations in Apache, then we can
> play that game as well.  (mod_cache deals with this Vary/ETag case
> just fine, FWIW.)

Unlike Squid, RFC compliance is part of our mission, at least when
it isn't due to a bug in the spec.  This is not a bug in the spec.

A high-efficiency response cache is expected to have multiple
representations of a given resource cached.  The cache key is the
URI.  If the set of varying header field values that generated the
cached response is different from the request set, then a
conditional GET request is made containing ALL of the cached
entity tags in an If-None-Match field (in accordance with the Vary
requirements).  If the server says that any one of the representations,
as indicated by the ETag in a 304 response, is okay, then the cached
representation with that entity tag is sent to the user-agent
regardless of the Vary calculation.  In short, if we have two active
representations that have the same etag, then we have violated the
spec and created an unnecessary interoperability problem:

    If the selecting request header fields for the cached entry do not
    match the selecting request header fields of the new request, then
    the cache MUST NOT use a cached entry to satisfy the request unless
    it first relays the new request to the origin server in a  
conditional
    request and the server responds with 304 (Not Modified),  
including an
    entity tag or Content-Location that indicates the entity to be used.

    If an entity tag was assigned to a cached representation, the
    forwarded request SHOULD be conditional and include the entity tags
    in an If-None-Match header field from all its cache entries for the
    resource. This conveys to the server the set of entities currently
    held by the cache, so that if any one of these entities matches the
    requested entity, the server can use the ETag header field in its  
304
    (Not Modified) response to tell the cache which entry is  
appropriate.
    If the entity-tag of the new response matches that of an existing
    entry, the new response SHOULD be used to update the header  
fields of
    the existing entry, and the result MUST be returned to the client.

In other words, the conditional request containing all of the entity
tags satisfies the semantics of Vary when the server responds with
304 and one of those entity tags.

And, no, mod_cache doesn't "deal with it" -- it just isn't a
very efficient cache.

> The compromise I'd be willing to accept is to have mod_deflate support
> the 'TE: gzip' request header and add 'gzip' to the Transfer-Encoding
> bit - and to prefer that over any Accept-Encoding bits that are sent.
> The ETag can clearly remain the same in that case - even as a strong
> ETag.  So, Squid can change to send along TE: gzip (if it isn't
> already).  And, everyone else who sends Accept-Encoding gets the
> result in a way that doesn't pooch their cache if they try to do a
> later conditional request.
>
> Is that acceptable?  -- justin

The best solution is to not mess with content-encoding at all, which
gets us out of both this consistency problem and related problems
with the entity-header fields (content-md5, signatures, etc.).
That is why transfer encoding was invented in the first place.

We should have an implementation of deflate as a transfer encoding,
but it should be configurable independent of the existing filter.
Some people will want TE specifically to avoid the addition of Vary
and all the other problems that content-changing filters cause.
For example, an additional directive option for CE, TE, or "either".

The existing filter needs to modify the ETag field value (and
any other entity-dependent values that we can think of) or be
removed as a feature.  Weak etags are not a solution -- being able
to make range requests of large cached representations requires a
strong etag, and it really isn't hard to provide one.  It is better
to not deflate the response at all than to interfere with caching.

In any case, I won't accept anyone's votes on this issue until there
is a patch that can be voted on, and the technical considerations of
security and correctness take priority over other trade-offs.  RTC.

....Roy

Mime
View raw message