Return-Path: Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: (qmail 10632 invoked from network); 27 Aug 2009 07:51:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Aug 2009 07:51:39 -0000 Received: (qmail 29798 invoked by uid 500); 27 Aug 2009 07:51:38 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 29719 invoked by uid 500); 27 Aug 2009 07:51:38 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 29710 invoked by uid 99); 27 Aug 2009 07:51:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Aug 2009 07:51:38 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of TOKILEY@aol.com designates 64.12.206.41 as permitted sender) Received: from [64.12.206.41] (HELO imr-ma03.mx.aol.com) (64.12.206.41) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Aug 2009 07:51:27 +0000 Received: from imo-ma02.mx.aol.com (imo-ma02.mx.aol.com [64.12.78.137]) by imr-ma03.mx.aol.com (8.14.1/8.14.1) with ESMTP id n7R7ookZ006477 for ; Thu, 27 Aug 2009 03:50:50 -0400 Received: from TOKILEY@aol.com by imo-ma02.mx.aol.com (mail_out_v42.5.) id e.d25.574a6577 (37069); Thu, 27 Aug 2009 03:50:48 -0400 (EDT) Received: from smtprly-da01.mx.aol.com (smtprly-da01.mx.aol.com [205.188.249.144]) by cia-db05.mx.aol.com (v124.15) with ESMTP id MAILCIADB053-5bac4a963ad31a0; Thu, 27 Aug 2009 03:50:46 -0400 Received: from webmail-d068 (webmail-d068.sim.aol.com [205.188.59.133]) by smtprly-da01.mx.aol.com (v124.15) with ESMTP id MAILSMTPRLYDA013-5bac4a963ad31a0; Thu, 27 Aug 2009 03:50:43 -0400 References: <4A958340.3090403@rowe-clan.net> To: dev@httpd.apache.org Subject: Re: mod_cache, mod_deflate and Vary: User-Agent Date: Thu, 27 Aug 2009 03:50:43 -0400 X-AOL-IP: 65.66.76.215 In-Reply-To: <4A958340.3090403@rowe-clan.net> X-MB-Message-Source: WebUI Received: from 65.66.76.215 by webmail-d068.sysops.aol.com (205.188.59.133) with HTTP (WebMailUI); Thu, 27 Aug 2009 03:50:43 -0400 MIME-Version: 1.0 From: tokiley@aol.com X-MB-Message-Type: User Content-Type: multipart/alternative; boundary="--------MB_8CBF50236B99DCE_20A8_C758_webmail-d068.sysops.aol.com" X-Mailer: AOL Webmail 44148-STANDARD Cc: tokiley@aol.com Message-Id: <8CBF50236ADB6D0-20A8-615F@webmail-d068.sysops.aol.com> X-AOL-SENDER: TOKILEY@aol.com X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Flag: NO ----------MB_8CBF50236B99DCE_20A8_C758_webmail-d068.sysops.aol.com Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii" > William A. Rowe, Jr. > > I think we blew it :) > > Vary: user-agent is not practical for correcting errant browser behavior. You have not 'blown it'. >From a certain perspective, it's the only reasonable thing to do. Everyone keeps forgetting one very important aspect of this issue and that is the fact that the 'Browsers' themselves are participating in the whole 'caching' scheme and that they are the source of the actual requests, so their behavior is as much a part of the equation as any inline proxy cache. There is no real solution to this problem. The HTTP protocol itself does not have the capability to deal with things correctly with regards to compressed variants. The only decision that anyone needs to make is 'Where is the pain factor?'. If you VARY on ANYTHING other than 'User-Agent' then this might show some reduction of the pain factor at the proxy level but you have now exponentially increased the pain factor at the infamous 'Last Mile'. Most modern browsers will NOT 'cache' anything that has a 'Vary:' header OTHER than 'User-Agent:'. This is as true today as it was 10 years ago. The following discussion involving myself and some of the authors of the SQUID Proxy caching Server took place just short of SEVEN (7) YEARS ago but, as unbelievable as it might seem, is still just as relevant ( and unresolved )... http://marc.info/?l=apache-modgzip&m=103958533520502&w=2 It's way too long to reproduce here but here is just the SUMMARY part. You would have to access the link above to read all the gory details... [snip] > Hello all. > > This is a continuation of the thread entitled... > > [Mod_gzip] "mod_gzip_send_vary=Yes" disables caching on IE > > After several hours spent doing my own testing with MSIE and > digging into MSIE internals with a kernel debugger I think I > have the answers. > > The news is NOT GOOD. > > I will start with a SUMMARY first for those who don't have the > time to read the whole, ugly story but for those who want to > know where the following 'conclusions' are coming from I > refer you to the rest of the message and the "detail". > > SUMMARY > > There is only 1 request header value that you can use with > "Vary:" that will cause MSIE to cache a non-compressed > response and that is ( drum roll please ) "User-Agent". > > If you use ANY other (legal) request header field name in > a "Vary:" header then MSIE ( Versions 4, 5 and 6 ) will > REFUSE to cache that response in the MSIE local cache. > > This is why Jordan is seeing a caching problem and Slava > is not. Slava is 'accidentally' using the only possible "Vary:" > field name that will cause MSIE to behave as it should > and cache a non-compressed response. > > Jordan is seeing non-compressed responses never being > cached by MSIE because the responses are arriving > with something other than "Vary: User-Agent" like > "Vary: Accept-Encoding". > > It should be perfectly legal and fine to send "Vary: Accept-Encoding" > on a non-compressed response that can 'Vary' on that field > value and that response SHOULD be 'cached' by MSIE... > but so much for assumptions. MSIE will NOT cache this response. > > MSIE will treat ANY field name other than "User-Agent" > as if "Vary: *" ( Vary + STAR ) was used and it will > NOT cache the non-compressed response. > > The reason the COMPRESSED responses are, in fact, > always getting cached no matter what "Vary:" field name > is present is just as I suspected... it is because MSIE > decides it MUST cache responses that arrive with > "Content-Encoding: gzip" because it MUST have a > disk ( cache ) file to work with in order to do the > decompression. > > The problem exists in ALL versions of MSIE but it's > even WORSE for any version earlier than 5.0. MSIE 4.x > will not even cache responses with "Vary: User-Agent". > > That's it for the SUMMARY. > > The rest of this message contains the gory details. [/snip] I participated in another lengthy 'offline' discussion about all this some 3 or 4 years ago again with the authors of SQUID. There was still no real resolution to the problem. The general consensus was that if there is always going to be a 'pain factor' then it's better to follow one of the rules of Networking and assume the following... "The least amount of resources will always be present the closer you get to the last mile." In other words... it's BETTER to live with some redundant traffic at the proxy level, where the equipment and bandwidth is usually more robust and closer to the backbone, than to put the pain factor onto the 'last mile' where resources are usually more constrained. If anyone is going to start dropping some special code anywhere to 'invisibly handle the problem' my suggestion would be to look at coming up with a scheme that undoes the damage these out-of-control redundant 'User-Agent' strings are causing. The only thing a proxy cache really needs to know is whether a certain 'User-Agent' string represents a different level of DEVCAP than another one. If all that is changing is a version number and there is no change with regards to actual Device Capabilities then there's no reason to cache a separate response for that User Agent. That still wouldn't represent the ultimate 'fix' for this multi-variant caching issue... but it sure would be a step in the right direction. Yours... Kevin Kiley BTW: This posting doesn't even come anywhere near the real issue which is that even Browsers that 'appear' to not be able to support 'Accept-Encoding: gzip, deflate' usually CAN... but it's actually all about MIME TYPES. The HTTP protocol does NOT provide a way for a client to indicate WHICH mime types it can or cannot 'decompress'. Browsers that appear 'broken' with regards to decompression are actually only 'broken' for certain MIME types. That's a complete separate discussion and I'm not goint to 'go there' tonight. -----Original Message----- From: William A. Rowe, Jr. To: dev@httpd.apache.org Sent: Wed, Aug 26, 2009 1:47 pm Subject: mod_cache, mod_deflate and Vary: User-Agent I think we blew it :) Vary: user-agent is not practical for correcting errant browser behavior. For example; User-Agent: Mozilla/5.0 Gecko/20090729 Firefox/3.5.2 produces a myriad number of 'variant' flavors when tagging Vary with the User-Agent when determining if the deflate/gzip compression should be served, or the uncompressed variant. What we really meant to do was to determine which Accept-Encoding values were invalid based on known browser bugs, and -remove them- from the A-E header *prior* to determining the cache handling (quick handler hook) or typical content handling. Which implies that setenvif + headers need an extra chance to run really first in front of the quick handler. Any better suggestions? ----------MB_8CBF50236B99DCE_20A8_C758_webmail-d068.sysops.aol.com Content-Transfer-Encoding: 7bit Content-Type: text/html; charset="us-ascii"
> William A. Rowe, Jr.
>
> I think we blew it :)
>
> Vary: user-agent is not practical for correcting errant browser behavior.

You have not 'blown it'.

>From a certain perspective, it's the only reasonable thing to do.

Everyone keeps forgetting one very important aspect of this issue
and that is the fact that the 'Browsers' themselves are
participating in the whole 'caching' scheme and that they
are the source of the actual requests, so their behavior is
as much a part of the equation as any inline proxy cache.

There is no real solution to this problem.

The HTTP protocol itself does not have the capability
to deal with things correctly with regards to
compressed variants.

The only decision that anyone needs to make is 'Where is
the pain factor?'.

If you VARY on ANYTHING other than 'User-Agent' then this
might show some reduction of the pain factor at the proxy
level but you have now exponentially increased the pain
factor at the infamous 'Last Mile'.

Most modern browsers will NOT 'cache' anything that has
a 'Vary:' header OTHER than 'User-Agent:'. This is as true
today as it was 10 years ago.

The following discussion involving myself and some of the
authors of the SQUID Proxy caching Server took place just
short of SEVEN (7) YEARS ago but, as unbelievable as it might
seem, is still just as relevant ( and unresolved )...

http://marc.info/?l=apache-modgzip&m=103958533520502&w=2

It's way too long to reproduce here but here is just
the SUMMARY part. You would have to access the link
above to read all the gory details...

[snip]

> Hello all.
>
> This is a continuation of the thread entitled...
>
> [Mod_gzip] "mod_gzip_send_vary=Yes" disables caching on IE
>
> After several hours spent doing my own testing with MSIE and
> digging into MSIE internals with a kernel debugger I think I
> have the answers.
>
> The news is NOT GOOD.
>
> I will start with a SUMMARY first for those who don't have the
> time to read the whole, ugly story but for those who want to
> know where the following 'conclusions' are coming from I
> refer you to the rest of the message and the "detail".
>
> SUMMARY
>
> There is only 1 request header value that you can use with
> "Vary:" that will cause MSIE to cache a non-compressed
> response and that is ( drum roll please ) "User-Agent".
>
> If you use ANY other (legal) request header field name in
> a "Vary:" header then MSIE ( Versions 4, 5 and 6 ) will
> REFUSE to cache that response in the MSIE local cache.
>
> This is why Jordan is seeing a caching problem and Slava
> is not. Slava is 'accidentally' using the only possible "Vary:"
> field name that will cause MSIE to behave as it should
> and cache a non-compressed response.
>
> Jordan is seeing non-compressed responses never being
> cached by MSIE because the responses are arriving
> with something other than "Vary: User-Agent" like
> "Vary: Accept-Encoding".
>
> It should be perfectly legal and fine to send "Vary: Accept-Encoding"
> on a non-compressed response that can 'Vary' on that field
> value and that response SHOULD be 'cached' by MSIE...
> but so much for assumptions. MSIE will NOT cache this response.
>
> MSIE will treat ANY field name other than "User-Agent"
> as if "Vary: *" ( Vary + STAR ) was used and it will
> NOT cache the non-compressed response.
>
> The reason the COMPRESSED responses are, in fact,
> always getting cached no matter what "Vary:" field name
> is present is just as I suspected... it is because MSIE
> decides it MUST cache responses that arrive with
> "Content-Encoding: gzip" because it MUST have a
> disk ( cache ) file to work with in order to do the
> decompression.
>
> The problem exists in ALL versions of MSIE but it's
> even WORSE for any version earlier than 5.0. MSIE 4.x
> will not even cache responses with "Vary: User-Agent".
>
> That's it for the SUMMARY.
>
> The rest of this message contains the gory details.

[/snip]

I participated in another lengthy 'offline' discussion about
all this some 3 or 4 years ago again with the authors of
SQUID. There was still no real resolution to the problem.

The general consensus was that if there is always going to
be a 'pain factor' then it's better to follow one of the
rules of Networking and assume the following...

"The least amount of resources will always be present
the closer you get to the last mile."

In other words... it's BETTER to live with some redundant
traffic at the proxy level, where the equipment and bandwidth
is usually more robust and closer to the backbone, than to put
the pain factor onto the 'last mile' where resources are usually
more constrained.

If anyone is going to start dropping some special code
anywhere to 'invisibly handle the problem' my suggestion
would be to look at coming up with a scheme that undoes
the damage these out-of-control redundant 'User-Agent' strings are
causing. The only thing a proxy cache really needs to know is
whether a certain 'User-Agent' string represents a
different level of DEVCAP than another one. If all that
is changing is a version number and there is no change
with regards to actual Device Capabilities then there's
no reason to cache a separate response for that User Agent.

That still wouldn't represent the ultimate 'fix' for this
multi-variant caching issue... but it sure would be a
step in the right direction.

Yours...
Kevin Kiley

BTW: This posting doesn't even come anywhere near the
real issue which is that even Browsers that 'appear'
to not be able to support 'Accept-Encoding: gzip, deflate'
usually CAN... but it's actually all about MIME TYPES.
The HTTP protocol does NOT provide a way for a client to
indicate WHICH mime types it can or cannot 'decompress'.
Browsers that appear 'broken' with regards to decompression
are actually only 'broken' for certain MIME types.

That's a complete separate discussion and I'm not
goint to 'go there' tonight.


-----Original Message-----
From: William A. Rowe, Jr. <wrowe@rowe-clan.net>
To: dev@httpd.apache.org <dev@httpd.apache.org>
Sent: Wed, Aug 26, 2009 1:47 pm
Subject: mod_cache, mod_deflate and Vary: User-Agent

I think we blew it :)

Vary: user-agent is not practical for correcting errant browser behavior.

For example;

User-Agent: Mozilla/5.0 Gecko/20090729 Firefox/3.5.2

produces a myriad number of 'variant' flavors when tagging Vary with
the User-Agent when determining if the deflate/gzip compression should
be served, or the uncompressed variant.

What we really meant to do was to determine which Accept-Encoding values
were invalid based on known browser bugs, and -remove them- from the A-E
header *prior* to determining the cache handling (quick handler hook) or
typical content handling.

Which implies that setenvif + headers need an extra chance to run really
first in front of the quick handler.

Any better suggestions?




----------MB_8CBF50236B99DCE_20A8_C758_webmail-d068.sysops.aol.com--