Return-Path: Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 22826 invoked by uid 500); 24 Jun 2003 10:09:21 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 12241 invoked from network); 24 Jun 2003 10:01:58 -0000 Message-ID: <45D4FFF3A2E8D41196FF0000F6FFF41B03B2AD95@gsys-s01.generali.fr.6.10.in-addr.arpa> From: CASTELLE Thomas To: dev@httpd.apache.org Subject: RE: [PATCH] mod_cache RFC compliance Date: Tue, 24 Jun 2003 12:01:18 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C33A37.8E975300" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C33A37.8E975300 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable Thanks for looking into this Paul ! Concerning the second question, I totally agree with you. I tested it = and it works. It is obviously more logical... I hope you will be able to integrate this patch in the next apache release... Thanks again, Thomas. =20 -----Message d'origine----- De : Paul J. Reder [mailto:rederpj@remulak.net] Envoy=E9 : mardi 17 juin 2003 05:21 =C0 : dev@httpd.apache.org Objet : Re: [PATCH] mod_cache RFC compliance I am reviewing this patch and have a few questions for Thomas or = someone in the know. The first has to do with Thomas' observation that Cache-Control is to = be found in r->err_headers_out rather than in r->headers_out... I looked = into this and ran into the following piece of code in mod_expires.c (expires_filter): /* * Check to see which output header table we should use; * mod_cgi loads script fields into r->err_headers_out, * for instance. */ expiry =3D apr_table_get(r->err_headers_out, "Expires"); if (expiry !=3D NULL) { t =3D r->err_headers_out; } else { expiry =3D apr_table_get(r->headers_out, "Expires"); t =3D r->headers_out; } This code later calls set_expiration_fields which adds Cache-Control to whichever headers t points to. Question 1: Does this imply that the cache code needs to look for ETags, = Cache-Control, etc in both places too? Right now the code all looks in r->headers_out. Thomas is changing some of them to look in r->err_headers_out instead. Is there a rule of thumb for determining when to check headers_out vs. err_headers_out? The second observation is related to the freshness computation = referenced below. I agree with Thomas that the code as it stands isn't quite = right, but I disagree that negating the logic fixes it. If I understand the RFC correctly (section 13.2.4), max_age and s-maxage take precedence = over an expires header. This would mean that, as Thomas points out, if an expires header is more liberal than the max_age value then it currently passes the freshness test even though it should fail. Question 2: Am I correct in my belief that the fix requires seperation of the = checks so that the maxage and s-maxage checks/computations happen first and = you only evaluate freshness using the expires value if there are no max-age = or s-maxage values present? (as in the following code) if (((smaxage !=3D -1) && (age < (smaxage - minfresh))) || ((maxage !=3D -1) && (age < (maxage + maxstale - minfresh))) = || ((smaxage =3D=3D -1) && (maxage =3D=3D -1) && (info->expire !=3D APR_DATE_BAD) && (age < (apr_time_sec(info->expire - info->date) + maxstale - minfresh)))) { /* it's fresh darlings... */ ... } In other words, if a value is specified for smaxage or maxage, the = freshness is determined solely based on those values (without regard to any = expires header). If there are no values specified for either maxage or smaxage = then the freshness is determined based on the expires header (if it exists). = Is that correct? By the way, mod_disk_cache has a lot of issues. If it isn't saving and retrieving the headers correctly, its a bug in the disk_cache code that needs to be fixed. I wouldn't look to mod_disk_cache as an example of how the caching code should be working. :) Thanks in advance for any insight that Thomas or anyone else has to = offer on these questions. Given a couple of answers I can have something committed Tuesday. Thanks for the research and patch Thomas. CASTELLE Thomas wrote: > Hello, >=20 > In order to accelerate the RFC compliance of mod_cache, I propose = these=20 > two patches which fix two problems : > - It doesn't handle the Cache-Control directives (max-age, max-stale, = > min-fresh...) properly. > - It doesn't send a "If-Modified-Since" to avoid that the backend = server=20 > sends every time a 200 response where a 304 would be enough. >=20 > Actually, we are waiting for these features to be implemented since=20 > http-2.0.43 so that we could put Apache in our production = environment. I=20 > am not an Apache developper, so this is just a proposition, but I = tested=20 > it and it seemed to work. >=20 >=20 > The cache_util.c patch deals with the first issue. First, the=20 > Cache-Control directive seems to be in the r->err_headers_out and not = in=20 > the r->headers_out. Second, the following test seems useless : >=20 > if ((-1 < smaxage && age < (smaxage - minfresh)) || > (-1 < maxage && age < (maxage + maxstale - minfresh)) || > (info->expire !=3D APR_DATE_BAD && > age < (apr_time_sec(info->expire - info->date) + maxstale = -=20 > minfresh))) { >=20 > because it is always true, no matter if max-age is set or not. > Let's take an example (I suppose here s-maxage, max-stale and = min-fresh=20 > not set, > so smaxage =3D - 1, maxstale =3D 0 and minfresh =3D 0) : > - with age =3D 20, maxage =3D -1 (not set) and expire - date =3D 30, = the=20 > second test > is FALSE. The third is TRUE. So the whole test is TRUE, the page is=20 > considered > to be fresh =3D> no problem. > - with age =3D 20, maxage =3D 10 and expire - date =3D 30, the second = test is=20 > FALSE, > but the third is still TRUE. So the whole test is TRUE, the page is=20 > considered > to be fresh =3D> problem. >=20 >=20 > The mod_cache.c patch deals with the second issue. The info structure = is=20 > never initialized, and even if it was, the info->lastmods and = info->etag=20 > don't seem to be saved in the file when using mod_disk_cache. So I = used=20 > the Etag and Last-Modified informations we can find in the=20 > r->headers_out and r->err_headers_out. I don't know if it's correct, = but=20 > it seems to work now... > > Thanks for looking to these patch and eventually integrate it in the=20 > next Apache release ! >=20 > Thanks a lot for this really great product ! >=20 >=20 > Thomas. --=20 Paul J. Reder ----------------------------------------------------------- "The strength of the Constitution lies entirely in the determination of = each citizen to defend it. Only if every single citizen feels duty bound to = do his share in this defense are the constitutional rights secure." -- Albert Einstein ------_=_NextPart_001_01C33A37.8E975300 Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable RE: [PATCH] mod_cache RFC compliance

Thanks for looking into this Paul !

Concerning the second question, I totally agree with = you. I tested it and it works. It is obviously more logical...

I hope you will be able to integrate this patch in = the next apache release...

Thanks again,

Thomas.

 

-----Message d'origine-----
De : Paul J. Reder [mailto:rederpj@remulak.net]
Envoy=E9 : mardi 17 juin 2003 05:21
=C0 : dev@httpd.apache.org
Objet : Re: [PATCH] mod_cache RFC compliance


I am reviewing this patch and have a few questions = for Thomas or someone
in the know.

The first has to do with Thomas' observation that = Cache-Control is to be
found in r->err_headers_out rather than in = r->headers_out... I looked into
this and ran into the following piece of code in = mod_expires.c (expires_filter):

     /*
      * Check to see which = output header table we should use;
      * mod_cgi loads = script fields into r->err_headers_out,
      * for = instance.
      */
     expiry =3D = apr_table_get(r->err_headers_out, "Expires");
     if (expiry !=3D NULL) = {
         t = =3D r->err_headers_out;
     }
     else {
         = expiry =3D apr_table_get(r->headers_out, = "Expires");
         t = =3D r->headers_out;
     }

This code later calls set_expiration_fields which = adds Cache-Control
to whichever headers t points to.

Question 1:
Does this imply that the cache code needs to look = for ETags, Cache-Control,
etc in both places too? Right now the code all looks = in r->headers_out.
Thomas is changing some of them to look in = r->err_headers_out instead.
Is there a rule of thumb for determining when to = check headers_out vs.
err_headers_out?



The second observation is related to the freshness = computation referenced
below. I agree with Thomas that the code as it = stands isn't quite right,
but I disagree that negating the logic fixes it. If = I understand the
RFC correctly (section 13.2.4), max_age and s-maxage = take precedence over
an expires header. This would mean that, as Thomas = points out, if an
expires header is more liberal than the max_age = value then it currently
passes the freshness test even though it should = fail.

Question 2:
Am I correct in my belief that the fix requires = seperation of the checks
so that the maxage and s-maxage checks/computations = happen first and you
only evaluate freshness using the expires value if = there are no max-age or
s-maxage values present? (as in the following = code)

     if (((smaxage !=3D -1) = && (age < (smaxage - minfresh))) ||
         = ((maxage  !=3D -1) && (age < (maxage + maxstale - = minfresh))) ||
         = ((smaxage =3D=3D -1) && (maxage =3D=3D -1) &&
          = (info->expire !=3D APR_DATE_BAD) &&
          (age = < (apr_time_sec(info->expire - info->date) + maxstale - = minfresh)))) {
         /* = it's fresh darlings... */
         = ...
     }

In other words, if a value is specified for smaxage = or maxage, the freshness
is determined solely based on those values (without = regard to any expires
header). If there are no values specified for either = maxage or smaxage then
the freshness is determined based on the expires = header (if it exists). Is
that correct?

By the way, mod_disk_cache has a lot of issues. If it = isn't saving and
retrieving the headers correctly, its a bug in the = disk_cache code that
needs to be fixed. I wouldn't look to mod_disk_cache = as an example of
how the caching code should be working. :)


Thanks in advance for any insight that Thomas or = anyone else has to offer
on these questions. Given a couple of answers I can = have something
committed Tuesday. Thanks for the research and patch = Thomas.


CASTELLE Thomas wrote:
> Hello,
>
> In order to accelerate the RFC compliance of = mod_cache, I propose these
> two patches which fix two problems :
> - It doesn't handle the Cache-Control = directives (max-age, max-stale,
> min-fresh...) properly.
> - It doesn't send a = "If-Modified-Since" to avoid that the backend server
> sends every time a 200 response where a 304 = would be enough.
>
> Actually, we are waiting for these features to = be implemented since
> http-2.0.43 so that we could put Apache in our = production environment. I
> am not an Apache developper, so this is just a = proposition, but I tested
> it and it seemed to work.
>
>
> The cache_util.c patch deals with the first = issue. First, the
> Cache-Control directive seems to be in the = r->err_headers_out and not in
> the r->headers_out. Second, the following = test seems useless :
>
>         = if ((-1 < smaxage && age < (smaxage - minfresh)) = ||
>          = ; (-1 < maxage && age < (maxage + maxstale - minfresh)) = ||
>          = ; (info->expire !=3D APR_DATE_BAD &&
>          = ;  age < (apr_time_sec(info->expire - info->date) + = maxstale -
> minfresh))) {
>
> because it is always true, no matter if max-age = is set or not.
> Let's take an example (I suppose here s-maxage, = max-stale and min-fresh
> not set,
> so smaxage =3D - 1, maxstale =3D 0 and minfresh = =3D 0) :
> - with age =3D 20, maxage =3D -1 (not set) and = expire - date =3D 30, the
> second test
> is FALSE. The third is TRUE. So the whole test = is TRUE, the page is
> considered
> to be fresh =3D> no problem.
> - with age =3D 20, maxage =3D 10 and expire - = date =3D 30, the second test is
> FALSE,
> but the third is still TRUE. So the whole test = is TRUE, the page is
> considered
> to be fresh =3D> problem.
>
>
> The mod_cache.c patch deals with the second = issue. The info structure is
> never initialized, and even if it was, the = info->lastmods and info->etag
> don't seem to be saved in the file when using = mod_disk_cache. So I used
> the Etag and Last-Modified informations we can = find in the
> r->headers_out and r->err_headers_out. I = don't know if it's correct, but
> it seems to work now...
 >
> Thanks for looking to these patch and = eventually integrate it in the
> next Apache release !
>
> Thanks a lot for this really great product = !
>
>
> Thomas.

--
Paul J. Reder
-----------------------------------------------------------
"The strength of the Constitution lies entirely = in the determination of each
citizen to defend it.  Only if every single = citizen feels duty bound to do
his share in this defense are the constitutional = rights secure."
-- Albert Einstein

------_=_NextPart_001_01C33A37.8E975300--