httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: mod_proxy distinguish cookies?
Date Wed, 05 May 2004 16:07:25 GMT

Hi Neil...
This is Kevin Kiley...

Personally, I don't think this discussion is all that OT for
Apache but others might disagree.

"Vary:" is still a broken mess out there and if 'getting it right'
is still anyone's goal then these are the kinds of discussions
that need to take place SOMEWHERE. Apache is not the
W3C but it's about as close as you can get.

I haven't looked at this whole thing for a LOOONG time so
I had to go back and check my notes regarding the 
MSIE 'User-Agent' trick.

As absurd as it sounds... you actually got the point.

"User-Agent:' IS, in fact, supposed to be a 'request-side'
header but when it comes to "Vary:"... the world can
turn upside down and what doesn't seem to make any
sense can actually WORK.

Unfortuneately... I can't find the (old) notes I had about
exactly what I did to make the "Vary: User-Agent" trick
actually work with MSIE. I was just mucking around and
never had any intention of implementing this as a solution
for anything but I DO remember somehow making it WORK
( almost ) just the way you are doing it.

If I have some time... I'll try to find those notes and the
test code I know I had somewhere that WORKED.

Another fellow who just responded pointed out that
"Content-encoding:'" seems to be another field that
MSIE will actually react to when it comes to VARY.

Well... it had been so long since I mucked with all
this I had to go back and find/read some notes.

The fellow who posted is SORT OF right about
"Content-Encoding:" LOOKING like it can "Vary:"
but it's not really "Vary:" at work at all.

The REALITY is explained in that link I already
supplied in last message...

Unless there has been some major change or patch to MSIE 6.0
and above then I still stand by my original research/statement...

MSIE will treat ANY field name OTHER than "User-Agent"
that arrives with a "Vary:" header on a non-compressed
response as if it had received
"Vary: *" ( Vary: STAR ) and it will NOT CACHE that response
locally. Every reference to page ( Via Refresh, Back-button, 
local hyperlink-jump, whatever ) will cause MSIE to go all
the way upstream for a new copy of the page EVERY TIME.

Maybe this is really what you want? Dunno.

The reason it also LOOKS like "Content-Encoding" is 
being accepted as a VARY and MSIE is sending out
an 'If-Modified-Since:' on those pages is NOT because
it is doing "Vary:"... it's for other strange reasons.

Whenever MSIE receives a compressed response
( Content-encoding: gzip ) then it will ALWAYS
cache that response... even if it has been specifically
told to NEVER do that ( no-cache, Expires: -1 , whatever ).

It HAS to. MSIE ( and Netsape ) MUST use the CACHE FILE
to DECOMPRESS the response... and it always KEEPS
it around.

Neither MSIE or Netscape nor Opera are able to 'decompress'
in memory. They all MUST have a cache file to work from
even if they are not supposed to EVER cache that 
particular response. They just do it anyway.

So... to make a long story short... MSIE will always 
decide it MUST cache a response with any kind of
"Content-Encoding:" on it and it will set the cache 
flags for that puppy to 'always-revalidate' and that's
where the "If-Modified-Since:" output is coming from
which makes it LOOKS like "Vary:" is involved...
but it is NOT.

However... in the world of "Vary:" you run into this snafu
whereby you can't differentiate between what you are
trying to tell an inline Proxy Cache 'what to do' versus
an end-point user-aget.

Example: If you are a COS ( Content Origin Server ) and
you want a downstream Proxy Cache to 'Vary' the 
( non-expired ) response it might give out according to
whether a requestor says it can handle compression
or not ( Accept-encoding: gzip, deflate ) then the right
VARY header to add to the response(s) is

"Vary: Accept-Encoding"

and not 

"Vary: Content-Encoding".

The "Content-Encoding" only comes FROM the Server.
The 'decision' you want the Proxy Cache to make can
only be based on whether a requestor has sent
"Accept-Encoding: gzip, deflate" ( or not ).

If there is no inline Proxy ( which is always impossible to tell )
and response is direct to browser then the same "Vary:"
header that would 'do the right thing' for a Proxy Cache
is meaningless for the end-point user-agent itself.

The User-Agent never 'varies' it's own 'Accept-Encoding:'
output header ( unless you are using Opera and clicking
all those 'imitate other browser' options in-between requests
for the same resource ).

One of the biggest mis-conceptions out there is that browsers
are somehow REQUIRED to obey all the RFC standard 
caching rules as if they were HTTP/x.x compliant Proxy

They are NOT. The RFC's themselves say that end-point
user agents can be 'implementation specific' when it comes
to caching and should not be considered true "Proxy Caches".

Most major browsers DO 'follow the rules' ( sort of ) but 
none of them could be considered true HTTP compliant
caches when it comes to what they are doing locally.

Example: If Netscape receives compressed data it uses
the cache itself as 'workspace' and keeps both the
compressed and the uncompressed versions of the
response around... but it does NOT do so according
to any known rules of storing multi-variants. It is just
using the cache as 'workspace' and God knows how
it keeps it all straight. Sometimes it doesn't even
do that. If you hit the 'print' button on Netscape then
sometimes it 'forgets' that it has 2 valid variants of
the same response and it tries to PRINT the
COMPRESSED cache file. Not good.

Anyway... this message is too long.

If anyone is even remotely interested in any of this I 
still suggest reading the following link...

It represents a summary of hours and hours I spent with 
ICE machines and debuggers watching exactly what 
MSIE 4.x - 6.x can/cannot do ( and other major browsers )
and is a good summary of what most people aren't 
even aware of.

I would LIKE to be DEAD WRONG about a lot of this
so if anyone has test results that show anything 
different please post them somewhere ( but not
necessarily HERE ).

Bottom line:

In order to do your 'Cookie' scheme and have it work with
all major browsers you might have to give up on the idea
that the responses can EVER be 'cached' locally by
a browser... but now you also lose the ability to have
it cached by ANYONE.

There is no HTTP caching control directive that says...

Cache-Control: no-cache-only-if-endpoint-user-agent

Given the caching issues in most 'end-point' browsers...
There probably should be such a directive.

The ONLY guy you don't want to cache it is the
end-point browser itself... but you DO want the
response available from other nearby caches so
your Content Origin Server doesn't get hammered
to death.


Original message... wrote:
> If this fellow were to simply 'stuff' his Cookie into the
> 'extra text' part of the User-Agent: string and send
> out a "Vary: User-Agent" along with the response
> then it would actually work the way he expects it too.

Thanks to Roy and Kevin for your insight. Sorry if this thread is
perhaps a bit off-topic for this list, but I hope you can indulge me
just a little longer. When I saw Roy's response regarding the 'Vary'
header, I thought that this would be exactly what I was after - you
could set 'Vary: Cookie' and then the browser would see that it should
reget the page if the cookie has changed. But this didn't seem to work
at all in practice. I am testing with the following sequence:

1. Get a page, which has Cache-Control and Expires headers set so that
it will be cached
2. Go to another page, where I use a form to change the option cookie
3. The options form sets the cookie and redirects the browser back to
the original page
4. The original page is displayed, not new version - browser doesn't

I have set all the headers, this is an example:

shell> HEAD
200 OK
Cache-Control: must-revalidate; s-maxage=900; max-age=901
Connection: close
Date: Wed, 05 May 2004 16:08:34 GMT
Server: Apache
Vary: Cookie
Content-Length: 7020
Content-Type: text/html
Expires: Wed, 05 May 2004 16:23:35 GMT
Last-Modified: Wed, 05 May 2004 16:08:34 GMT
Client-Date: Wed, 05 May 2004 16:08:35 GMT
Client-Response-Num: 1
MSSmartTagsPreventParsing: TRUE

So I am setting the Cache-Control to cache the page, and the client is
directed to revalidate. I say in the Vary header that Cookie header must
be taken into account. But the browser simply fails to revalidate the
original page at all. If I manually refresh then it gets the correct
version, but I can't control manual refreshes (or user options) on the
browser end. I would simply love to be able to hit that "sweet spot"
where the browser caches the page, but also sees that some magic
component has changed and thus the old version of the page in the cache
cannot be used any more.

When I saw Kevin's response, it made perfect sense at first, because
what he describes is exactly what I experienced above. Neither Mozilla
1.4 or IE 6 appear to take any notice of the 'Vary: Cookie' header. I
decided to try Kevin's suggestion re the User-Agent field, but after
looking at this further I am very confused. The User-Agent field is
something that is passed in *from* the client, not *to* the client from
a server. Why would IE or any other client even look at a User-Agent
field? Ok, ok, I understand, the whole point is that this is a "hack",
but even so it doesn't seem to work for me. I tried setting the
User-Agent field:

shell> HEAD
200 OK
Cache-Control: must-revalidate; s-maxage=900; max-age=901
Connection: close
Date: Wed, 05 May 2004 16:08:34 GMT
User-Agent: Mozilla/4.0 (compatible; opts=300)
Server: Apache
Vary: User-Agent
Content-Length: 7020
Content-Type: text/html
Expires: Wed, 05 May 2004 16:23:35 GMT
Last-Modified: Wed, 05 May 2004 16:08:34 GMT
Client-Date: Wed, 05 May 2004 16:08:35 GMT
Client-Response-Num: 1
MSSmartTagsPreventParsing: TRUE

As you can see, I've encoded the opts cookie into the User-Agent header.
Am I doing this right? Nothing appears to change, indeed now IE doesn't
even get the proper version when I hit 'Refresh'. Maybe I'm being dense
and didn't read the instructions correctly, but it seemed like this was
what was being suggested.

Once again, I apologize if this is overly obvious or off-topic, but I
have the feeling that I'm just missing something obvious here. Any
insight would be much appreciated. In summary, the problem currently
appears to be that neither Mozilla or IE appears to even want to
revalidate the original page after the cookie has changed. When the
browser is redirected back to the original page (using identical URL)
from the options form, both browsers just use their cached version,
without even touching the server at all. No request, nothing. When I use
the 'Vary: Cookie' header, then manually refreshing does get the new
version. I know that browser settings can determine how often the
browser revalidates the page, but I can't tell random users on the
internet to change their settings for my site. I would have thought that
it should be possible for a page to be cached, and yet still be
invalidated by the cookie (or, in the general case, some 'Vary' header)

Anyway, thanks again...


View raw message