httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TOKI...@aol.com
Subject Re: [PATCH] mod_disk cached fixed
Date Thu, 05 Aug 2004 12:32:51 GMT

> Brian Akins wrote...
>
> >TOKILEY@aol.com wrote...
> >
> >
> > > Brian Akins wrote...
> > >
> > > Serving cached content:
> > >
> > > - lookup uri in cache (via md5?).
> > > - check varies - a list of headers to vary on
> > > - caculate new key (md5) based on uri and clients value of these 
headers
> > > - lookup new uri in cache
> > > - continue as normal
> >
> > Don't forget that you can't just 'MD5' a header from one response and
> > compare it to an 'MD5' value for same header field from another response.
> >
>
> This isn't what I meant.  I mean get the "first-level" key by the md5 of 
> the uri, not the headers.

Ok... fine... but when you wrote this...

"caculate new key (md5) based on uri AND clients value of these headers"

The AND is what got me worried.

I thought you were referring to the scheme you proposed in an
earlier email where you WERE planning on doing just that...

> Brian Akins wrote...
>
> I actually have somewhat of a solution:
>
> URL encode the uri and any vary elements:
> www.cnn.com/index.html?this=that
> Accept-Encoding: gzip
> Cookie: Special=SomeValue
>
> may become:
> 
> 
www.cnn.com%2Findex.html%3Fthis%3Dthat+Accept-Encoding%3A+gzip+Cookie%3A+Special%3DSomeValue
>
> A very simple hashing function could put this in some directory 
> structure, so the file on disk may be:
>
>
/var/cache/apache/00/89/www.cnn.com%2Findex.html%3Fthis%3Dthat+Accept-Encoding%3A+gzip+Cookie%3A+Special%3DSomeValue
>
>Should be pretty fast (?) if the urlencode was effecient.
>
> Brian Akins

Not that this wouldn't acutally WORK under some circumstances
( It might ) but it would qualify as just a 'hack'. It wouldnt'
qualify as a good way to perform RFC standard 'Vary:'.

> Brian Akins also wrote...
> >
> > > BrowserMatch ".*MSIE [1-3]|MSIE [1-5].*Mac.*|^Mozilla/[1-4].*Nav" 
> > no-gzip
> > >
> > > and just "vary" on no-gzip (1 or 0), but this may be hard to do just
> > > using headers...
> >
> > It's not hard to do at all... question would be whether it's ever
> > the 'right' thing to do.
> >
>
> If you know alot about the data you can do this.  In "reverse proxy" 
> mode, you would.

Take your logic just a tiny step farther.

I you know EVERYTHING about the data... then you CERTAINLY can/would.

You have just hit on something that should probably be discussed
further.

The whole reason "Vary:" was even created was so that COS ( Content
Origin Servers ) could tell downstream caches to come back upstream
for a 'freshness' check for reasons OTHER than simple time/date
based 'expiration' rules.

I am not certain but I believe it was actually the whole "User-Agent:"
deal that made it necessary. When it became obvious that different
major release browsers had completely different levels of HTTP
support and the HTML that might work for one would puke on another
then it became necessary to have 'Multi-Variants' of the same
response. I am sure the 'scheme' was intended to ( and certainly
will ) handle all kinds of other situations ( Cookie values would 
be second, I guess ) but IIRC there was no more pressing issue 
for 'Vary:' and 'Multiple Variants of a request to the same URI' 
than to solve the emerging 'User-Agent:' nightmare.

So that's all well and good.

There really SHOULD be a way for any cache to hold 2 different
copies of the same non-expired page for both MSIE and Netscape,
when the only reason to do so is that the HTML that works for
one (still) might not work for the other.

But that leads back to YOUR idea ( concern )...

When does a response REALLY (actually) Vary and
why should you have to store tons and tons of 
responses all the time?

That's easy... when the entire response for the same
URI differs in any way from an earlier ( non-expired )
response to a request for the same URI... only then
does it 'actually Vary'.

If you MD5 and/or hard-CRC/cheksum the actual BODY
DATA of a response and it does not differ one iota
from another (earlier) non-expired response to a
request for the same URI... then those 2 'responses'
DO NOT VARY.

It is only when the RESPONSE DATA itself is 'different'
that it can be said the responses truly 'Vary'.

So here is the deal...

Even if you get 25,000 different 'User-Agents' asking
for the same URI... there will most probably only be
a small sub-set of actual RESPONSES coming from the
COS. It is only THAT sub-set of responses that need
to be stored by a cache and 'associated' with the
different ( Varying ) User-Agents.

So that doesn't mean a (smart) cache needs to store
25,000 variants of the same response... It only needs
to STORE responses that ACTUALLY VARY.

How the sub-sets of 'Varying' responses get 'associated'
with the right set(s) of 'Varying' header field(s)
( ie. User-Agent ) is something that the 'Vary:' scheme
lacks and was not considered in the design.

Topic for discussion?
Kindling for a flame ware?
Not sure... but you raise an interesting question.

> Brian Akins also wrote...
> 
> > TOKILEY@aol.com wrote...
> >
> > That's why it (Vary) remains one of the least-supported 
> > features of HTTP.
> >
> Squid supports it really well.

Look again.

Only the most recent versions of SQUID make any attempt
to support 'Vary:' at all. Prior to just a release or
2 ago... and for years and years... SQUID would simply
treat ANY request that had a 'Vary:' header of any
kind as if it were 'Vary: *' ( Vary: STAR ) and would
refuse to cache it.

There is NOW ( finally ) SOME support for 'Vary:' in
SQUID and it actually WILL hold a (limited) number 
of Multi-Variants... but last time I checked there
still is no support for 'ETag:'... which is an
essential part of supporting 'Vary:'.

So I wouldn't say SQUID supports 'Vary:' very well...
I would say it supports it 'pretty well'. Full Etag 
support would bring it up to 'very well'.

It certainly supports 'Vary:' better than just about
any other major caching product. Most of those (still) haven't
even BEGUN to support Vary.

The 'end point' browser caches are a whole 'nother can
of worms when it comes to Vary.

MSIE makes absolutely no attempt to support 'Vary' and
will still do what SQUID was doing for years and years
and will just treat ANY response that shows up with
any 'Vary:' header at all as if it was 'pragma: no-cache'
and will refuse to even try and cache it locally.

Not sure what Netscape is doing these days but last time
I checked they were in the same boat. Any response
with any 'Vary:' header at all is treated as if it
were 'Vary: *' ( Vary: STAR ) and is NOT CACHED.

That means that even though 'Vary:' might allow a
downstream Proxy Cache to 'hold' multiple variants of 
a response... that very advantage gets lost when
you realize that ( most ) browsers are now totally
losing the ability to cache the response locally
and are coming back 'upstream' to the cache each
and every time. The 'last mile' traffic increases
geometrically whenever 'Vary:' is involved.

So what does all this long-wind mean to you?


I would just keep in mind that to whatever extent
you try to add 'Vary:' support just be sure there
is a way for end-users of the product to fully
turn this support OFF if they so choose.

The increased 'last-mile' traffic whenever 'Vary:'
is used might get so bad for them that it isn't
even worth allowing it. They might RATHER have
single-variants held in caches and have the 
inline cache going back upstream rather than have
all their browsers unable to cache anything and
hammering the hell out of the cache.

It should remain THEIR choice, no matter what.

Yours
Kevin Kiley

PS: I guess what worries me about some of the current
discussion is that it seems to be taking the same
'design track' that the Apache 2.0 filtering scheme
did. There came a point in the 2.0 design discussions
where this whole 'compression' thing was the ONLY
example anyone could think of for the entire scheme...
and it led to some trouble.

It just seems to be happening all over again and I
will throw out the same warning I did back then
( which will probably also be totally ignored, again ).

If adding 'Vary:' support to mod_whatever is based
TOO MUCH on just solving the 'How do we store a
compressed and non-compressed variant for the same
URI?' then watch out that you don't create problems
for OTHER scenarios ( filters ).

'Accept-encoding: gzip' is supposed to be all you
need to 'Vary:' on. It will work in MOST cases.
Browsers are not supposed to LIE about their ability
to 'accept' certain encodings under ALL circumstances.

In reality... that isn't enough. All browsers are
LYING about 'Accept-encoding:' under one scenario
or another... but that is not, necessarily, something
that can ever be 'solved' using 'Vary:' alone.

'Vary:' was never designed to solve such a problem.

I have had hours of discussions with various authors
of the SQUID product about this ( Rob Collins, etc. )
and opinions differ as to whether 'Vary:' alone can 
ever really solve this 'Browsers are LIARS' issue. 
Some of them think it can, others don't. 

The fat lady hasn't sung yet.

If you are at all interested... there is a long, long
discussion about compresssion, SQUID and 'Vary:' that
took place with SQUID authors contributing on the
mod_gzip (public) forum. This was over 2 years ago and it was
BEFORE there was even a version of SQUID that made
any attempt to support 'Vary:'.

Mime
View raw message