cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianugo Rabellino <gian...@apache.org>
Subject [RT] More on caching, expires, and proxy-friendly headers
Date Tue, 11 Feb 2003 10:57:31 GMT

This RT integrates the one done more than one year ago and available at 
http://marc.theaimsgroup.com/?t=101074439900001&r=1&w=2.

As of now you know that we have a basic HTTP header control that mimics 
at a pipeline level the mod_expires functionality of the Apache HTTPD 
server. This was a good start, but now I feel it's time to refine it and 
make it better. Work is needed on two sides:

Proxy handling
==============

The approach to full proxy compliance should be done, once again :-), in 
microsteps. I've been reading the HTTP/1.1 specs and the proxy-related 
RFCs, and boy, it's not easy at all to implement a fully proxy compliant 
system. It can be done, but it requires serious thinking and a major 
rework of the request handling phase.

Full proxy compliance depends on the ability of dealing with conditional 
requests, handling a bunch of request headers all in some way 
interdependant and tricky to say the least. I'm not saying that we 
shouldn't do that sooner or later, but I'd rather plan this activity 
carefully, and possibily together with someone (Chuck?) from the httpd 
group working on the proxy part, in order to ensure that things work 
smoothly.

So, the first microstep is an easy one, just as a start. The companion 
to the expires header is the "Cache-Control" header 
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9): this 
header allows for a finer grained control over the request, suggesting 
proxies what to with the results.

While Expires uses an HTTP date header, built in Cocoon by adding the 
result of the pipeline@expires attribute to the current system time, 
Cache-Control is somehow smarter, since it gives caches an hint on what 
is cacheable, how it should be cached (revalidated or not) and for how 
long in seconds. To make it short, my proposal is to add a Cache-Control 
header to any request coming from a pipeline with the "expires" 
attribute set with the following template:

Cache-Control: max-age={expires value in seconds}, public

The "public" keyword instructs the proxy to store a resource in its 
cache even if it should not be considered cacheable. This can be 
dangerous somehow, since the proxy will serve requests coming from 
"protected" resources without performing authentication on the origin 
server, but in the end I think that it's safe to assume that if a 
pipeline is marked with an "expires" header, than the user is perfectly 
aware that such resource can, and will, be cached.

The patch is a no-brainer, such as:

Index: 
src/java/org/apache/cocoon/components/pipeline/AbstractProcessingPipeline.java
===================================================================
RCS file: 
/home/cvs/xml-cocoon2/src/java/org/apache/cocoon/components/pipeline/AbstractProcessingPipeline.java,v
retrieving revision 1.33
diff -r1.33 AbstractProcessingPipeline.java
468a469
 >
472c473,474
<             res.setDateHeader("Expires", expires);
---
 >             res.setDateHeader("Expires", System.currentTimeMillis() + 
expires);
 >             res.setHeader("Cache-Control", "max-age=" + expires/1000 
+ ", public");
474c476
<                  new Long(expires));
---
 >                  new Long(expires + System.currentTimeMillis()));
760c762
<         return System.currentTimeMillis() + expires;
---
 >         return expires;

The only problem I see is that this header is not set under Tomcat 
(*argh*, Jetty works just OK!)  so I have to investigate what's going 
wrong, but for the rest I'm ready to commit it if you agree on the idea 
(I'm reluctant to commit it right away since it somehow touches the 
pipeline core, where I almost never worked). Now for the second (and 
more interesting) point: Cocoon integration.

Cocoon integration
==================

The above approach works perfectly for communication with the external 
world, be it a reverse proxy or just a browser cache. Sometimes, 
however, there might be a case where you might want to use this concept 
internally: imagine to have an aggregation of different cocoon 
pipelines, where you have some resources for which you want to check 
validity strictly and some others that are pretty heavy to generate, 
uncacheable because the components you are using are not cacheable by 
themselves but on which you have full control on the expiration time. In 
this case, having an internal use of the expires attribute would be 
pretty useful, i.e.:

<pipeline internal-only="true">
   <parameter name="expires" value="now plus 5 minutes"/>
   <match pattern="my-heavy-resource">
     <generate src="xmldb:xindice:///db/not/changing/frequently"/>
     <serialize/>
   </match>
</pipeline>

<pipeline internal-only="true">
   <match pattern="my-dynamic-resource">
     <generate src="/content/that/might/change"/>
     <serialize/>
   </match>
</pipeline>

<pipeline>
   <match pattern="mybeautifulportal.html">
     <aggregate element="portal">
       <part src="cocoon://my-heavy-resource" element="news"/>
       <part src="cocoon://my-dynamic-resource" element="data"/>
     </aggregate>
     <tranform src="myportal2html.xsl"/>
     <serialize type="html"/>
   </match>
</pipeline>


If we agree that this is useful, let's see the actual implementation. 
First, let's get back to the general principle: if a user sets an 
"expires" attribute on a pipeline, what she want's to say is "I know 
better than the Cocoon cache for how long this resource has to be 
considered fresh". This is by all means a configuration imposed by the 
user, to which the caching system should obey blindly. My opinion
then, wrt the caching pipeline, is that if an expires was set, all the 
pipeline engine should do is to check if the given resource has already 
been generated, and if the expiration time has not passed yet. If so, 
the resource should be considered fresh disregarding any Validity 
objects or Cacheable components.

This, AFAIU, would boost the performance even for internal pipelines and 
aggregation, and would let us use internal pipelines in a smarter and 
faster way. Not only that: if we are to use the expires feature even 
internally, Cocoon's performance will get a boost even without using a 
reverse proxy in front of the application server, since all the 
(potentially heavy) algorithms to check the resource's validity would be 
skipped.

Now for the implementation I wish I knew better the Cocoon caching 
internals, but from a quick read it seems to me that there should be:

- some logic in CachedResponse to store and get expires (easy);

- appropriate logic in the proper points to obtain the expires object 
from the environment and set a CachedResponse accordingly (is it enough 
to change CachingProcessingPipeline#cacheResults?

- more logic in the validatePipeline() method in 
AbstractCachingProcessingPipeline.java to take into account the expires 
object configured, if present.

- in all cases, all the algorithms that check if a cached entry is still 
valid, i.e. every place where a cache entry is built, validated or 
invalidated, should take into account the expires configuration.

I have started to play on this too, but I am wondering if I'm following 
the right path or if I'm missing something. Also, it might be worth 
considering to have a different CachingPipeline implementation 
(ExpiresEnabledCachingPipeline? Yuck ;-)), at least for a first start.

Comments and questions?

Ciao,

-- 
Gianugo Rabellino
Pro-netics s.r.l.
http://www.pro-netics.com


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message