cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <sylv...@apache.org>
Subject Re: Accessing cache validities from flow
Date Thu, 18 Dec 2003 10:27:26 GMT
Stefano Mazzocchi wrote:

>
> On 16 Dec 2003, at 14:02, bernhard huber wrote:
>
>> hi,
>> <snip/>
>>
>>>
>>> Now, the way the event cache works is like this:
>>>
>>>   a) a cache validity is generated
>>>   b) pipeline is executed
>>>   c) result is stored in the cache
>>>
>>> then the pipeline is never called, until an event is triggered 
>>> externally (from an avalon component) that invalidates that 
>>> particular cache entity.
>>
>> Some experiences I had using some sort of simple Servlet Cache Filter 
>> using caching by sessionid: The session is not touched as long the 
>> cache entry is valid, the session gets expired due to this caching. 
>> But perhaps that's just an issue of the servlet engine, or the 
>> Servlet CachFilter issue,
>>
>> Your sentence ..the pipeline is never called, just reminded me of the 
>> that situation, and of the danger of pruning to optimistically.
>
>
> Thru my JSR 170 work, I've been exposed to what Day Software does with 
> their Communique CMS.
>
> What they do is very simple architecturally yet extremely elegant and 
> effective.
>
> They don't use the file system. Never. They store everything in a 
> repository. Consider it a virtual file system with observable hooks 
> for now (it's much more than that but it's not important for this 
> discussion).
>
> Whenever a resource is generated by the publishing layer, this layer 
> instantiates a sort of "reading transaction" so that the repository 
> can keep track of all the dependencies of that particular resource.
>
> Note that they have libraries that, for example, generate images out 
> of markup (sort-of Batik serializer style) so those dependencies might 
> be quite big (I heard up to 100 files for a single resource).
>
> When a resource is modified into the repository, the tree of 
> dependencies is crawled "backwards" and all resources that depend on 
> it gets invalidated. Invalidation gets all the way up to an Apache 
> module.
>
> This allows Communique to handle *extreme* load (they run Sony Style 
> with just two boxes for fault tollerance and simple load balancing and 
> that site generates tens of millions of requests per day, with huge 
> peaks at break times). Note that communique is a 100% pure java 
> servlet and the repository is all java again and runs in the same JVM: 
> no database at all, no networking overhead.
>
> How do that do that? well, first thing is that most requests are 
> handled directly by the web server... the servlet engine is called 
> only when the resource needs to be regenerated.
>
> This leaves the machines almost doing nothing all day (if you run 
> stuff from mod_cache, you can fill a T1 with a 486) and ready to go 
> when a new resource has to be generated.
>
> Now, the drawbacks:
>
>  1) if you are *not* in control of your data environment, the above 
> system doesn't work... unless you have synchronous polling on the 
> datasources... which is not any better than the caching system we have.
>
>  2) the caching strategy is centralized. I'm not sure if components 
> can have their own, but for sure it's a pain. [note: they don't have a 
> pipelined rendering layer, just a one stage, template driven, approach]
>
> Communique is a publishing system on steroids, so I hear that writing 
> an entire web application with Communique is probably harder than 
> using a simple webapp framework.
>
> Cocoon wants to do both things and do them well, with as less effort 
> and code as possible.
>
> Cocoon cannot has a predefined global caching strategy, it doesn't 
> make sense. But it *does* make sense to have a pipeline-granular 
> caching strategy, with the ability to modify it at the component level.
>
> We have this already, we just need to polish it up a little and find 
> out what is *really* useful and how things can be made more usable.
>
> Today, modifying the caching strategy at the component level is black 
> magic: nobody does. I'm scared about it myself, so I can't even 
> imagine users trying to do this themselves.
>
> The off-the-shelf pipeline caches have some "magic" associated to 
> it.... they are black boxes, basically, nobody really knows when 
> something is caching or not.... it's hard to tell, hard to visualize, 
> hard to control, hard to tune and hard to modify.
>
> This makes the whole thing much less powerful than it really is.
>
> You know how much I care about caching, but there is still a lot of 
> work to do... expecially now that new "inverted" scenarios of use are 
> going to appear on the horizon with observable repositories.


We're talking about validities, but before checking a validity, we first 
have to obtain it through the cache key.

In the current Cocoon architecture, keys of cache entries are built with 
abitrary data defined by each of the individual pipeline components. The 
result of this is that we can have several different cached responses 
for a single request definition (URI + headers).

The big benefit of this approach is that many variations can be cached 
(depending on night/day, local weather, whatever), but the main 
disadvantage is that the pipeline *must* be built for every request in 
order to compute the cache key, even if the response is served from the 
cache afterwards.

A solution would be to have another pipeline implementation that uses a 
different strategy to build cache keys. What comes to mind is that 
instead of returning abitrary values for key, components could return 
some matching criteria on request metadata. The pipeline could then 
organize the cache entries by URIs, each URI having a list of cached 
responses along with the matching criteria.

This approach would reduce the possible cached variations for a given 
request, but would allow to find cached content (and its validity) 
without incuring the cost of building the pipeline.

What do you think?

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com



Mime
View raw message