Return-Path: Delivered-To: apmail-cocoon-dev-archive@www.apache.org Received: (qmail 86029 invoked from network); 17 Dec 2003 23:06:07 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 17 Dec 2003 23:06:07 -0000 Received: (qmail 29617 invoked by uid 500); 17 Dec 2003 23:05:51 -0000 Delivered-To: apmail-cocoon-dev-archive@cocoon.apache.org Received: (qmail 29589 invoked by uid 500); 17 Dec 2003 23:05:50 -0000 Mailing-List: contact dev-help@cocoon.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: dev@cocoon.apache.org Delivered-To: mailing list dev@cocoon.apache.org Received: (qmail 29492 invoked from network); 17 Dec 2003 23:05:49 -0000 Received: from unknown (HELO pulse.betaversion.org) (217.158.110.65) by daedalus.apache.org with SMTP; 17 Dec 2003 23:05:49 -0000 Received: (qmail 28867 invoked from network); 17 Dec 2003 23:05:54 -0000 Received: from unknown (HELO ?192.58.206.176?) (stefano@192.58.206.176) by pulse.betaversion.org with SMTP; 17 Dec 2003 23:05:54 -0000 Mime-Version: 1.0 (Apple Message framework v606) In-Reply-To: <28936.1071601366@www48.gmx.net> References: <264DFACA-2FA5-11D8-8A39-000393D2CB02@apache.org> <28936.1071601366@www48.gmx.net> Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Stefano Mazzocchi Subject: Re: Accessing cache validities from flow Date: Wed, 17 Dec 2003 18:07:42 -0500 To: dev@cocoon.apache.org X-Mailer: Apple Mail (2.606) X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On 16 Dec 2003, at 14:02, bernhard huber wrote: > hi, > >> >> Now, the way the event cache works is like this: >> >> a) a cache validity is generated >> b) pipeline is executed >> c) result is stored in the cache >> >> then the pipeline is never called, until an event is triggered >> externally (from an avalon component) that invalidates that particular >> cache entity. > Some experiences I had using some sort of simple Servlet Cache Filter > using > caching by sessionid: > The session is not touched as long the cache entry is valid, the > session > gets > expired due to this caching. > But perhaps that's just an issue of the servlet engine, or the Servlet > CachFilter issue, > > Your sentence ..the pipeline is never called, just reminded me of the > that > situation, > and of the danger of pruning to optimistically. Thru my JSR 170 work, I've been exposed to what Day Software does with their Communique CMS. What they do is very simple architecturally yet extremely elegant and effective. They don't use the file system. Never. They store everything in a repository. Consider it a virtual file system with observable hooks for now (it's much more than that but it's not important for this discussion). Whenever a resource is generated by the publishing layer, this layer instantiates a sort of "reading transaction" so that the repository can keep track of all the dependencies of that particular resource. Note that they have libraries that, for example, generate images out of markup (sort-of Batik serializer style) so those dependencies might be quite big (I heard up to 100 files for a single resource). When a resource is modified into the repository, the tree of dependencies is crawled "backwards" and all resources that depend on it gets invalidated. Invalidation gets all the way up to an Apache module. This allows Communique to handle *extreme* load (they run Sony Style with just two boxes for fault tollerance and simple load balancing and that site generates tens of millions of requests per day, with huge peaks at break times). Note that communique is a 100% pure java servlet and the repository is all java again and runs in the same JVM: no database at all, no networking overhead. How do that do that? well, first thing is that most requests are handled directly by the web server... the servlet engine is called only when the resource needs to be regenerated. This leaves the machines almost doing nothing all day (if you run stuff from mod_cache, you can fill a T1 with a 486) and ready to go when a new resource has to be generated. Now, the drawbacks: 1) if you are *not* in control of your data environment, the above system doesn't work... unless you have synchronous polling on the datasources... which is not any better than the caching system we have. 2) the caching strategy is centralized. I'm not sure if components can have their own, but for sure it's a pain. [note: they don't have a pipelined rendering layer, just a one stage, template driven, approach] Communique is a publishing system on steroids, so I hear that writing an entire web application with Communique is probably harder than using a simple webapp framework. Cocoon wants to do both things and do them well, with as less effort and code as possible. Cocoon cannot has a predefined global caching strategy, it doesn't make sense. But it *does* make sense to have a pipeline-granular caching strategy, with the ability to modify it at the component level. We have this already, we just need to polish it up a little and find out what is *really* useful and how things can be made more usable. Today, modifying the caching strategy at the component level is black magic: nobody does. I'm scared about it myself, so I can't even imagine users trying to do this themselves. The off-the-shelf pipeline caches have some "magic" associated to it.... they are black boxes, basically, nobody really knows when something is caching or not.... it's hard to tell, hard to visualize, hard to control, hard to tune and hard to modify. This makes the whole thing much less powerful than it really is. You know how much I care about caching, but there is still a lot of work to do... expecially now that new "inverted" scenarios of use are going to appear on the horizon with observable repositories. -- Stefano.