cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [RT] More on caching, expires, and proxy-friendly headers
Date Tue, 11 Feb 2003 12:54:56 GMT
Gianugo Rabellino wrote:
> 
> This RT integrates the one done more than one year ago and available at 
> http://marc.theaimsgroup.com/?t=101074439900001&r=1&w=2.
 >
> As of now you know that we have a basic HTTP header control that mimics 
> at a pipeline level the mod_expires functionality of the Apache HTTPD 
> server. This was a good start, but now I feel it's time to refine it and 
> make it better. Work is needed on two sides:
> 
> Proxy handling
> ==============
> 
> The approach to full proxy compliance should be done, once again :-), in 
> microsteps. I've been reading the HTTP/1.1 specs and the proxy-related 
> RFCs, and boy, it's not easy at all to implement a fully proxy compliant 
> system. It can be done, but it requires serious thinking and a major 
> rework of the request handling phase.
> 
> Full proxy compliance depends on the ability of dealing with conditional 
> requests, handling a bunch of request headers all in some way 
> interdependant and tricky to say the least. I'm not saying that we 
> shouldn't do that sooner or later, but I'd rather plan this activity 
> carefully, and possibily together with someone (Chuck?) from the httpd 
> group working on the proxy part, in order to ensure that things work 
> smoothly.

At ApacheCon I set with Chuck who forwarded me to another guy who's the 
one doing the work nowadays, but I forgot who he was. But I can ask him 
again.

> So, the first microstep is an easy one, just as a start. The companion 
> to the expires header is the "Cache-Control" header 
> (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9): this 
> header allows for a finer grained control over the request, suggesting 
> proxies what to with the results.
> 
> While Expires uses an HTTP date header, built in Cocoon by adding the 
> result of the pipeline@expires attribute to the current system time, 
> Cache-Control is somehow smarter, since it gives caches an hint on what 
> is cacheable, how it should be cached (revalidated or not) and for how 
> long in seconds. To make it short, my proposal is to add a Cache-Control 
> header to any request coming from a pipeline with the "expires" 
> attribute set with the following template:
> 
> Cache-Control: max-age={expires value in seconds}, public
> 
> The "public" keyword instructs the proxy to store a resource in its 
> cache even if it should not be considered cacheable. This can be 
> dangerous somehow, since the proxy will serve requests coming from 
> "protected" resources without performing authentication on the origin 
> server, but in the end I think that it's safe to assume that if a 
> pipeline is marked with an "expires" header, than the user is perfectly 
> aware that such resource can, and will, be cached.

Question: (and an important one)

Suppose you have a resource like

  /images/logo

that you hit with two different user agent and that a pipeline renders 
differently depending on the user agent, how can a proxy behave friendly 
to this? do we have a way to specify that a specific request has to be 
matched not only against a URI but also against the user-agent that 
requested it?

I'm perfectly aware of the fact that we could have the resource 
/images/logo redirect to /images/logo.png or /image/logo.gif depending 
on user agent and that route around the proxy problem, but that's a hack 
and involves another round-trip to the client for the redirect.

Pier and I came up with this question and we think it might be an HTTP 
architectural fault, but before asking Roy, what do you people think?

> The patch is a no-brainer, such as:
> 
> Index: 
> src/java/org/apache/cocoon/components/pipeline/AbstractProcessingPipeline.java 
> 
> ===================================================================
> RCS file: 
> /home/cvs/xml-cocoon2/src/java/org/apache/cocoon/components/pipeline/AbstractProcessingPipeline.java,v

> 
> retrieving revision 1.33
> diff -r1.33 AbstractProcessingPipeline.java
> 468a469
>  >
> 472c473,474
> <             res.setDateHeader("Expires", expires);
> ---
>  >             res.setDateHeader("Expires", System.currentTimeMillis() + 
> expires);
>  >             res.setHeader("Cache-Control", "max-age=" + expires/1000 
> + ", public");
> 474c476
> <                  new Long(expires));
> ---
>  >                  new Long(expires + System.currentTimeMillis()));
> 760c762
> <         return System.currentTimeMillis() + expires;
> ---
>  >         return expires;
> 
> The only problem I see is that this header is not set under Tomcat 
> (*argh*, Jetty works just OK!)

why I'm not surprised?

> so I have to investigate what's going 
> wrong, but for the rest I'm ready to commit it if you agree on the idea 
> (I'm reluctant to commit it right away since it somehow touches the 
> pipeline core, where I almost never worked). Now for the second (and 
> more interesting) point: Cocoon integration.
> 
> Cocoon integration
> ==================
> 
> The above approach works perfectly for communication with the external 
> world, be it a reverse proxy or just a browser cache. Sometimes, 
> however, there might be a case where you might want to use this concept 
> internally: imagine to have an aggregation of different cocoon 
> pipelines, where you have some resources for which you want to check 
> validity strictly and some others that are pretty heavy to generate, 
> uncacheable because the components you are using are not cacheable by 
> themselves but on which you have full control on the expiration time. In 
> this case, having an internal use of the expires attribute would be 
> pretty useful, i.e.:
> 
> <pipeline internal-only="true">
>   <parameter name="expires" value="now plus 5 minutes"/>

isn't something like

  <parameter name="TTL" value="5 minutes"/>

simpler to understand? we don't expect people to write

  <parameter name="expires" value="tomorrow plus 25 hours"/>

don't we?

>   <match pattern="my-heavy-resource">
>     <generate src="xmldb:xindice:///db/not/changing/frequently"/>
>     <serialize/>
>   </match>
> </pipeline>
> 
> <pipeline internal-only="true">
>   <match pattern="my-dynamic-resource">
>     <generate src="/content/that/might/change"/>
>     <serialize/>
>   </match>
> </pipeline>
> 
> <pipeline>
>   <match pattern="mybeautifulportal.html">
>     <aggregate element="portal">
>       <part src="cocoon://my-heavy-resource" element="news"/>
>       <part src="cocoon://my-dynamic-resource" element="data"/>
>     </aggregate>
>     <tranform src="myportal2html.xsl"/>
>     <serialize type="html"/>
>   </match>
> </pipeline>
> 
> 
> If we agree that this is useful, let's see the actual implementation. 
> First, let's get back to the general principle: if a user sets an 
> "expires" attribute on a pipeline, what she want's to say is "I know 
> better than the Cocoon cache for how long this resource has to be 
> considered fresh". This is by all means a configuration imposed by the 
> user, to which the caching system should obey blindly. My opinion
> then, wrt the caching pipeline, is that if an expires was set, all the 
> pipeline engine should do is to check if the given resource has already 
> been generated, and if the expiration time has not passed yet. If so, 
> the resource should be considered fresh disregarding any Validity 
> objects or Cacheable components.
> 
> This, AFAIU, would boost the performance even for internal pipelines and 
> aggregation, and would let us use internal pipelines in a smarter and 
> faster way. Not only that: if we are to use the expires feature even 
> internally, Cocoon's performance will get a boost even without using a 
> reverse proxy in front of the application server, since all the 
> (potentially heavy) algorithms to check the resource's validity would be 
> skipped.
> 
> Now for the implementation I wish I knew better the Cocoon caching 
> internals, but from a quick read it seems to me that there should be:
> 
> - some logic in CachedResponse to store and get expires (easy);
> 
> - appropriate logic in the proper points to obtain the expires object 
> from the environment and set a CachedResponse accordingly (is it enough 
> to change CachingProcessingPipeline#cacheResults?
> 
> - more logic in the validatePipeline() method in 
> AbstractCachingProcessingPipeline.java to take into account the expires 
> object configured, if present.
> 
> - in all cases, all the algorithms that check if a cached entry is still 
> valid, i.e. every place where a cache entry is built, validated or 
> invalidated, should take into account the expires configuration.
> 
> I have started to play on this too, but I am wondering if I'm following 
> the right path or if I'm missing something. Also, it might be worth 
> considering to have a different CachingPipeline implementation 
> (ExpiresEnabledCachingPipeline? Yuck ;-)), at least for a first start.
> 
> Comments and questions?

Sounds like a good idea but I think Carsten is the one that knows the 
caching internals better.

One thing that you didn't describe is the ability for cocoon to reply to 
proxy requests with the 'body hasn't changed' error code just by using 
the pipeline caching logic but without having to regenerate the whole 
thing. Is this another microstep or you have reasons against this?

-- 
Stefano Mazzocchi                               <stefano@apache.org>
    Pluralitas non est ponenda sine necessitate [William of Ockham]
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message