httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TOKI...@aol.com
Subject Re: Wrong etag sent with mod_deflate
Date Sun, 10 Dec 2006 01:33:42 GMT
Hi Henrik...
Kevin again...

That's a FANTASTIC response. THANKS! I mean it.

As I think I told you years ago when we argued the whole Vary: compression
deal with regards to how broken client browsers really are... I admire someone
who can "stick to their guns".

We never got into the ETag thing at the time because the first implementation
of the "Vary:" scheme in SQUID ( at that time ) which we were tossing around
wasn't even really making any attempt to handle "ETag" at all. 

That was to come later ( in SQUID ), IIRC.

Your 4 step "Chain of Events" below brings a clarity to the discussion
that was lacking up to this point.

I am going to go back to 3 posts ago and say again that I really do AGREE 
with you ( and Roy ). Unless the discussion turns to changing the RFC specs 
( not likely ) then Apache is exhibiting broken behavior for a MUST 
requirement
and it should be fixed. Exactly HOW it should be fixed in order to satisfy
ap_meets_conditions() ( not your concern, I know ) would be the only 
remaining discussion to have.

However... I am going to also stand by my original statement that if any
cache software has information available to it that it can use to handle
confusing results from a COS ( Content Origin Server ) and still 
"do the right thing" then that ( in my opinion ) can/should be a MUST 
requirement that is still missing from the RFC respects.

It might even look something like this...

"If, following a freshness check to the COS, a cache system receives 
an ETag response value which matches EITHER the "identity" variant
of a stored response OR the "compressed" variant of the same response
then the cache system MUST refer to its own onboard client request 
information and deliver the correct variant to the client based on the
headers of the (current) request. If the cache system detects confusing
response fulfillment information from a COS Server than it CAN/SHOULD
eject the variants and let the request pass through to the COS."

To me, this would follow the "be lenient in what you accept and strict
in what you send" policy which the RFCs themselves advocate at
all times.

I know, I know... I've already lost you... but as long as I'm wandering
in the woods, then... let me do breakdown on your "Chain of Events"
that takes things farther into the backwoods just on the chance you
might see my point of view on all of this.

I just want all of this to WORK the way it's supposed to, and that's
all the way down the line from Server to Cache to Client. I want
all the components to "do the right thing" whenever they can
and not act so brain-dead all the time.

> Henrik wrote...
>
> The chain of events leading to the problem is as follows:
>
> 1. We forward request A. Lets say this claims Accept-Encoding: gzip.

Okay. 
Life is good.

> 2. Apache mod_deflate returns an gzip:ed entity with ETag
> "6bf1f7-6-1b6d6340" and Vary: Accept-Encoding.

Yep.
It returns BOTH an "ETag" AND "Vary: Accept-Encoding"...
and you ( supposedly ) store BOTH pieces of information and
associate them with the current request set. You also have 
the "Content-encoding: gzip" response header to store as well 
for further indication of which "variant" this response represents.

So we now KNOW that THIS ETAG represents the "compressed"
version of the entity and we can "remember" that if/when we want to. 
Life is still good.

> 3. We get another request with a different Accept-Encoding value.

Of course. Inevitable.
For the sake of simplicity let's assume the "different Accept-Encoding
value" is that the client didn't send any "Accept-Encoding" header at all.
That would be the most common case.

> This gets forwarded to Apache with an If-None-Match header telling the ETags
> of the entities we have, i.e. If-None-Match "6bf1f7-6-1b6d6340".

Right. 
So in this scenario all you have cached so far is the one compressed variant 
of 
the entity that would only satisfy a request that arrives with "Accept
-encoding: gzip" 
( The VARY condition that came back on the original response with that ETag ).
Moving on...

> 4. The entity hasn't changed and Apache responds with a 304 ETag
> "6bf1f7-6-1b6d6340" telling us that the valid response entity for this
> request is the previous received response with ETag "6bf1f7-6-1b6d6340",
> and any updated HTTP headers for that response.
>
> The problem arises in '4'.

Problem arising... yes... but should that be "end of story" and
should the poor slob making the request get the wrong thing?

I think not. ( and obviously here is where we part company ).

Here's where common sense seems to be thrown out the window for
the sake of following an RFC.

You just got an "ETag" response back from a COS that is telling you
to use a response that you KNOW IS WRONG... if you are paying
any attention to the "Vary:" condition that is associated with that
cached response.

Your cached response DOES NOT MATCH the "Vary:" requirement.
regardless of what the "ETag" response seems to be telling you.

Common sense, ( to me, anyway ) would dictate that you should
recognize that you don't have what you really need to fulfill the
response according to the "Vary:" requirement and go upstream
for a variant of the entity that DOES satisfy the ( known ) "Vary:"
requirement.

> "... telling us that the valid response entity for this
> request is the previous received response with 
> ETag "6bf1f7-6-1b6d6340",......."

If Apache told you to jump off a bridge, would you do it?

You KNOW that this "previously received response" does NOT
match your ( current request ) VARY condition. Why would you
knowingly send the WRONG THING?

I suppose I am totally flame-bait at this point but maybe I am
so lost in the woods a good forest-fire is what will light the
way home.

We obviously are going to always have very different perspectives
about what software "ought" to be doing. I think it should be as
"smart" as it can be or it's not finished yet. I think when it KNOWS
what the right thing to do is... it should go ahead and do that.

Let me finish where I started...

GREAT RESPONSE. Thank you!

I agree that you should be seeing better ETags and whatever screw-ups
happen if you don't are not actually, truly, legally your fault regarldess of
what other information you might have at your fingertips to make
better decisions.

Actually maybe what you are NOT doing correctly is noticing that
something is quite "amiss" with the responses from the COS.

If there is even the slightest evidence that the COS is telling you
to "jump off a bridge" and that it might be WRONG the only safe
thing to do is eject the variants and just let the request pass 
through to the Server itself until the dubious situation is resolved.

Better to do that than to ( knowingly ) send the wrong response.

Yours
Kevin


In a message dated 12/9/2006 8:56:45 PM Pacific Standard Time, 
hno@squid-cache.org writes:
lör 2006-12-09 klockan 20:38 -0500 skrev TOKILEY@aol.com:

> If you are referring to Justin quoting ME let me supply a big
> fat MEA CULPA here and say right now that I haven't looked
> at the SQUID Vary/ETag code since the last major release
> and I DO NOT KNOW FOR SURE what SQUID is doing ( or
> not doing ) if/when it sees the same (strong) ETag for both
> a "compressed" and an "identity" version of the same entity.

Thats not the problem. The problem is that Apache tells us that we
should use whatever we got first on all subsequent responses.

The chain of events leading to the problem is as follows:

1. We forward request A. Lets say this claims Accept-Encoding: gzip.

2. Apache mod_deflate returns an gzip:ed entity with ETag
"6bf1f7-6-1b6d6340" and Vary: Accept-Encoding.

3. We get another request with a different Accept-Encoding value. This
gets forwarded to Apache with an If-None-Match header telling the ETags
of the entities we have, i.e. If-None-Match "6bf1f7-6-1b6d6340".

4. The entity hasn't changed and Apache responds with a 304 ETag
"6bf1f7-6-1b6d6340" telling us that the valid response entity for this
request is the previous received response with ETag "6bf1f7-6-1b6d6340",
and any updated HTTP headers for that response.

The problem arises in '4'.

> Period. I DO NOT KNOW FER SURE.

Then stop saying that Squid is broken, does not implement X or broken
clients such as Squid. All I ask. Fine to say that you do not understand
why it is a problem for Squid.

> In my other posts, I was suggesting, however, that even if
> an upstream content server ( Apache ) is not sending separate
> unique ETags I am still having a hard time understanding why
> that would cause SQUID to deliver the wrong "Varied" response
> back to the user.

Simply because Apache explicitly tells it do exactly that in it's 304
response.

> A compressed version of an entity IS the same entity...

Nope. It's a different representation of the the same resource, but not
the same entity in terms of HTTP. This is the key difference between
Content-Encoding and Transfer-Encoding.

Content-Encoding is a property of the entity.

Transfer-Encoding is a property of how the message is sent, just like
chunked, with no implications on the entity.

The problem arises from trying to use Content-Encoding as if it was
Transfer-Encoding.

Many years ago we had the same discussion about Vary, and when dust
settled all understood the problem about not sending correct Vary in the
responses. Now as the cache implementation is evolving we are hitting
the exact same problem again in a different form this time due to ETag
collisions. I am sorry that we did not realize the full extent of the
brokenness of these responses the first time when Vary was discussed.

> for
> all intents and purposes... it just has "compression" 
> applied. One cannot possibly become "stale" without the
> other also being "stale" at the same exact moment in time.

HTTP does not make this strict freshness relation between entities of
the same URI, but thats a different question and generally not a big
problem.

> At the moment... yes... I do... but if you read my other posts I
> also have a feeling the reason I can't quote you Verse and Chapter
> from an RFC is because I have a sneaking suspicion that there
> is something "missing" from the ETag/Vary scheme that can 
> lead to problems like this... and it's NOT IN ANY RFC YET.

And what I am saying is that Apache mod_deflate is violating a MUST
level requirement on ETag in the RFC, thereby making the caching section
of the same RFC break down.

> In other words... you may be doing exactly what hours and hours
> of reading an RFC seems to be telling you you SHOULD do... but
> there still might be something "else" that OUGHT to be done.

And I am telling you that this part of the RFC is complete, save for the
small detail that the server can not signal that both the compressed and
identity encoding becomes stale when one changes, only one at a time.

> There will always be the chance that some upstream server will
> ( mistakenly? ) keep the same (strong) ETag on a compressed
> variant.

True, there will always be non-compliant implementation out there in
various forms, and they will continue causing problems at least for as
long as it's about MUST level violations. In many cases (this one
included) workarounds can be found, but that does not justify the ones
being non-compliant to continue and intentionally being non-compliant
when informed about the problem.

> People are not perfect and they make mistakes. I still
> think that even when that happens any caching software should
> follow the "be lenient in what you accpet and strict in what you
> send" rule and still use the other information available to it

Which in this case is none. The only information we ever get from Apache
is the ETag of the supposedly valid to use response, and possibly new
freshness details about the same.

> ( sic: What the client really asked for and expects ) and 
> "do the right thing". Only the cache knows what the client
> is REALLY asking for.

There is a pretty clear distinction in the RFC on this. Caches obeys
Vary, origin servers Accept-XXX. Yes, it can be argued if that's the
best way of designing a protocol, but it's how it is specified and as
long as at least the MUST level requirements is implemented it works out
reasonably well.

Regards
Henrik

Mime
View raw message