Return-Path: Delivered-To: apmail-xml-cocoon-dev-archive@xml.apache.org Received: (qmail 80175 invoked by uid 500); 26 Jun 2003 18:24:50 -0000 Mailing-List: contact cocoon-dev-help@xml.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: cocoon-dev@xml.apache.org Delivered-To: mailing list cocoon-dev@xml.apache.org Received: (qmail 80159 invoked from network); 26 Jun 2003 18:24:50 -0000 Received: from gate3.stjude.org (192.55.208.13) by daedalus.apache.org with SMTP; 26 Jun 2003 18:24:50 -0000 Received: by gate3.stjude.org; (8.9.3/1.3/10May95) id NAA1675701; Thu, 26 Jun 2003 13:24:52 -0500 (CDT) Received: from somewhere by smtpxd Received: from somewhere by smtpxd Message-ID: <1E0CC447E59C974CA5C7160D2A2854EC097CD1@SJMEMXMB04.stjude.sjcrh.local> From: "Hunsberger, Peter" To: Date: Thu, 26 Jun 2003 13:24:50 -0500 Subject: RE: External/Event Based Cache Invalidation (somewhat long) MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 X-MS-Has-Attach: X-MS-TNEF-Correlator: X-OriginalArrivalTime: 26 Jun 2003 18:24:50.0500 (UTC) FILETIME=[3B16D040:01C33C10] content-class: urn:content-classes:message Thread-Topic: External/Event Based Cache Invalidation (somewhat long) Thread-Index: AcM3/tCQQ4tndCPaQAel6dywNIZpSAEDb+7A X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Geoff Howard writes: =20 > Below is the larger picture I envision for a new kind of=20 > cache invaliditation that I've needed in the past and comes=20 > up in requests from people using EJB or database driven data=20 > that is cacheable. I'd love feedback from anyone who's=20 > interested.=20 As you know, we're interested. However, we're in the middle of rolling out our first release at the moment. Until the bug reports slow down we won't really have any time to look at this.... > That leaves only several choices > - make the event know about what cache keys it needs to=20 > remove. This is the only solution currently available and it=20 > has in practice meant hard coding the event-key relationship=20 > somewhere and manually maintaining it. Not good IMHO. This is essentially what we currently do. It's a little generalized in that we have specific data classes associated with specific Cocoon generators and we've added an interface to those classes that specifies a method that can be used to generate the cache key for any given data item. We then maintain our own hash map that maps the events to the keys. We don't remove the items from the Cocoon cache, rather we map to the cache validity objects and call an "invalidate" method on them that flips a flag to return invalid for the given object once an event invalidation occurs. The extra overhead of maintaining the extra hash map isn't great, but not horrible either. We're currently missing a way to have dependencies invalidated (but that could be aggregated validities or another map from the pointers to the cache validity objects).=20 It's worth noting that we sort of borrow on Sylvian's method of retroactively updating cache validity objects: our objects start out invalid in the generator setup and aren't marked valid until the actual generate method has completed (you have to bootstrap things into the cache). > - search through every cached item on the receipt of an event=20 > to see if it is now invalid in light of the current event. =20 > Also not good. No thanks, we considered this and rejected it... > - Extend the cache to provide support for the cache-event=20 > concept. This is the tack I'm taking. Essentially, this=20 > solution involves the CacheImpl keeping track of mapping the=20 > Event's declared in an EventValidity to the key under which=20 > that response is stored in the cache. >=20 > The "glue" that is missing is the=20 > org.apache.cocoon.caching.impl.CacheImpl > extension, because it won't compile without the change I made=20 > to sourceresolve, which is not yet in cocoon's cvs. For some=20 > odd reason I'm having a hard time building the sourceresolve=20 > package using its build script. It's also not "done" as=20 > noted below - but I'd love others to be able to work on it. >=20 > Here are the issues the "event aware" CacheImpl would need to=20 > take care of: > - during store() it gets the SourceValidity[] from the=20 > CachedResponse and looks for instanceof EventValidity=20 > (recursively for AggregatedValidity). > - if found, it calls EventValidity.getEvent() and stores the=20 > key-event mapping. Sounds good, essentially it's the same thing we're doing, but your way Cocoon manages it for us... > - expose a removeByEvent(Event e) method that can be called=20 > by the specific event-handling component. This could be a=20 > jms listener (as I've orginally envisioned it) or an=20 > http/soap based system (as in the ESI patch that was in > bugzilla) or even a cocoon Action or Flow, or some=20 > combination of all of the above. > - When the key is ejected from the cache for other reasons=20 > (another pipeline component signalled invalid for example) I=20 > think it's necessary to at that moment remove the event-key=20 > mapping entry. This introduces a complication in the data=20 > structure used to store these mappings as I mention below. I=20 > also haven't looked into the effect of the store janitor - if=20 > it acts directly on the Store without going through the=20 > CacheImpl wrapper, that introduces a wrinkle. >=20 > Most of the above is accounted for - except for the data=20 > structure to store the event-key mappings. =20 I wondered when you where going to get to this... =20 > As discussed=20 > above, it needs to: > - allow duplicate keys (each event may uncache multiple=20 > pipelines, and each pipeline might be uncached by any of=20 > multiple events). So it needs a Bag. > - allow lookup of mappings based on either event or key. =20 > Jakarta Commons Collections has a DoubleOrderedMap, but not a=20 > DoubleOrderedBag. Bummer. > - be persistent across shutdown and startup, and needs to=20 > recover gracefully when the cache is missing (like when it's=20 > been manually deleted) Hmm, not so sure about this? What are you envisioning here? > - be efficient Wouldn't two separate Maps (of Maps) also work? More work to keep them synced, but I think that's what you're going to end up building any way? > I have made an assumption so far that I'd like tested by some=20 > sharp minds. When a removeByEvent() is received, the=20 > CacheImpl would do something like PipelineCacheKey[]=20 > getByEvent() on its datastructure. This would rely on > hashCode() and equals() of Event() to locate relevant events.=20 > I think this works well for true "equals" type of=20 > information: like "table_name" and "primary_key" -- if they=20 > are both equal, the event has happened. But there may be=20 > some where a "greater than" or "less than" or worse yet, a=20 > kind of wild card lookup might need to be supported.=20 Well, as long as you can map events to keys many to many haven't you already go this (though not in an automatic regex kind of way)? > Can=20 > that be accomodated by a Collections sort of implementation,=20 > or does something more flexible need to be invented? As it=20 > stands, you might implement hashCode() in a way that will=20 > cause intentional collisions and rely on equals() to sort=20 > things out. Is that crazy? Not sure, would almost need to see some pseudo code... =20