cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Unico Hommes" <Un...@hippo.nl>
Subject RE: External/Event Based Cache Invalidation (somewhat long)
Date Mon, 30 Jun 2003 15:55:03 GMT
 

Geoff Howard wrote:
>
> > From: Unico Hommes [mailto:Unico@hippo.nl]
> 
> > I can't believe I've missed this post. Damn.
> > > Below is the larger picture I envision for a new kind of cache 
> > > invaliditation
> ...
> > > depending on other factors might never come.  It seems to me more
> > fitting
> > > with the transient nature of events to act on them when
> they arrive
> > and
> > > then
> > > discard them.
> >
> > That would definitely be the way to go.
> 
> Good - I'm getting closer to a system that can at least be tested 
> using this method.
> 
> ...
> > > Here are the issues the "event aware" CacheImpl would need to take
> > care
> > > of:
> > > - during store() it gets the SourceValidity[] from the 
> > > CachedResponse
> > and
> > > looks for instanceof EventValidity (recursively for
> > AggregatedValidity).
> > > - if found, it calls EventValidity.getEvent() and stores the 
> > > key-event mapping.
> > > - expose a removeByEvent(Event e) method that can be
> called by the
> > > specific event-handling component.  This could be a jms
> listener (as
> > > I've
> > orginally
> > > envisioned it) or an http/soap based system (as in the ESI patch 
> > > that
> > was
> > > in
> > > bugzilla) or even a cocoon Action or Flow, or some combination of 
> > > all
> > of
> > > the
> > > above.
> 
> I think this (though I've changed the method name to 
> processEvent(Event e)) is the "contract" that needs to be exposed to 
> the cache listening systems.
> Whether
> the first implementation I'm working on for how the events/cache keys 
> are handled internally holds water over time remains to be seen, but 
> the way I'm trying to go should leave this open to alternative 
> implementations of the internals without changing this simple 
> contract.

OK.

> 
> > > - When the key is ejected from the cache for other
> reasons (another
> > > pipeline component signalled invalid for example) I think it's 
> > > necessary to at
> > that
> > > moment remove the event-key mapping entry.  This introduces a
> > complication
> > > in the data structure used to store these mappings as I
> mention below.
> > I
> > > also haven't looked into the effect of the store janitor - if it 
> > > acts directly on the Store without going through the CacheImpl 
> > > wrapper, that
> > introduces
> > > a
> > > wrinkle.
> >
> > Hmm, that does seem to be the case.
> 
> Well, I've thought this through more - if you are using persistent 
> cache, then the store janitor simply moves items from the Memory store

> to the persistent store which should have no effect on the event 
> tracking.  If not using the persistent cache then there could be a 
> problem - but I think that could be looked into later - I'd guess it's

> rare that people needing the event cache are going to use in-memory 
> only caching.

Just for my understanding. The way the cache and the event register can
get out of synch is when not using persistent store, the storejanitor
removes an item from the memory store, effectively removing it
completely when the system is shut down. This is in fact the same
situation as when someone should delete the persistent store manually.
In these cases there may exist event-key mappings with non-existent
keys. But why is that a problem? If an event is received that is mapped
to non-existent keys, each of these non-existent keys is checked, if it
exists then remove from cache otherwise ignore (and possibly remove the
non-existent key-event mapping from the register). So?

> 
> > > Most of the above is accounted for - except for the data
> structure
> > > to store the event-key mappings.  As discussed above, it needs to:
> > > - allow duplicate keys (each event may uncache multiple
> pipelines,
> > > and each pipeline might be uncached by any of multiple
> events).  So
> > > it needs a
> > Bag.
> > > - allow lookup of mappings based on either event or key.  Jakarta
> > Commons
> > > Collections has a DoubleOrderedMap, but not a DoubleOrderedBag.
> > Bummer.
> 
> Ok, I was a little muddled here on the MultiMap/Bag distinction.  At 
> least as interpreted by the jakarta collections stuff, we need a 
> DoubleOrderedMultiMap
> - which still doesn't exist.  But I've got a solution nearly done on 
> my hard drive not yet ready to commit even to scratchpad (but soon!) 
> that uses two MultiMaps to create a de-facto DoubleOrderedMultiMap.  I

> think only time and testing will tell if a more dedicated solution is 
> needed.  It occurred to me that since all we're storing is relatively 
> light-weight Event's and PipelineCacheKeys, keeping two synced 
> MultiMaps may be fine.  There are some threading issues that need to 
> get tested though.
> 
> > > - be persistent across shutdown and startup, and needs to recover 
> > > gracefully when the cache is missing (like when it's been
> manually
> > > deleted)
> 
> Currently still unimplemented.  If the double MultiMap solution turns 
> out OK, it may be enough to serialize this out to disk - I think every

> thing should be serializable.
> Although the Event's are not yet explicitly so they are simple.
> 
> > > - be efficient
> 
> We'll see!
> 
> > > I have made an assumption so far that I'd like tested by
> some sharp
> > minds.
> > > When a removeByEvent() is received, the CacheImpl would
> do something
> > like
> > > PipelineCacheKey[] getByEvent() on its datastructure.  This would 
> > > rely
> > on
> > > hashCode() and equals() of Event() to locate relevant events.  I 
> > > think this works well for true "equals" type of information: like 
> > > "table_name"
> > and
> > > "primary_key" -- if they are both equal, the event has happened.  
> > > But there may be some where a "greater than" or "less
> than" or worse
> > > yet, a kind
> > of
> > > wild card lookup might need to be supported.  Can that be 
> > > accomodated
> > by a
> > > Collections sort of implementation, or does something
> more flexible
> > need
> > > to be invented? As it stands, you might implement hashCode() in a 
> > > way that will cause intentional collisions and rely on equals() to
> > sort
> > > things out.  Is that crazy?
> > >
> >
> > I think this would be difficult and will impact performance
> because of
> > grouping of multiple keys under the same hash code. 
> Consider wildcard
> > patterns and you'd like to invalidate **. In order for this
> to work,
> > all keys must return the exact same hash code. The same situation 
> > occurs with hierarchy matching, i.e. if you want /path/to to match 
> > /path/to/my/source (for instance in a directory generator
> that relies
> > on all sources under a certain context directory). In this case / 
> > matches everything else too.
> 
> I think that wildcard-type handling can come after the current 
> solution is tested.  As I mentioned above, the specific implementation

> can change without affecting the basic concepts.  I've thought that an

> simple light-weight database might be best too - having a table 
> indexed both by Event and PipelineCacheKey would do the trick.  It 
> would also get us persistence.  But that can come later if needed.
> 

Excellent.

> I'm not quite as worried about performance, because the only time 
> these lookups are happening is on receipt of the events
> - it won't affect routine user requests. 

That is, assuming the events come in assynchronously.

> I want to get the
> basic mechanism working and proven and while leaving room for later 
> streamlining.
> 
> > I actually experienced this last situation. What I did was
> to generate
> > apart from the original event, events for all ancestor
> paths as well.
> 
> Yes, this is another way to accomplish that "wildcard" 
> ability that could work in some situations.

Huh? I don't understand. In the wildcard situation there will be too
many possible matching events to generate.

> 
> ...
> 
> > >   Experiment with external cache invalidation.  An
> > EventAwareCacheImpl
> > > >   can't be committed yet because it relies on the latest
> > sourceresolve
> > > >   cvs.
> >
> > So let's update! I'm really eager to see this stuff and get
> my hands
> > dirty. What are the difficulties are you experiencing with building 
> > sourceresolve, can I help?
> 
> Well, the problem was that I was trying to build sourceresolve from 
> excalibur's cvs but wanted to build fresh each of it's dependencies, 
> including framework - but it was looking like they are transitioning 
> builds to maven which was not working smoothly for me (it was trying 
> to download a lot of jars that I already have over a slow modem).
> 
> If you can get a clean sourceresolve dist. built, let me know and I 
> can commit it.  Since I just patched my local jar with the one 
> modified class I haven't needed this locally and haven't bothered to 
> try again yet.

I just build it with latest avalon repository deps. I'll mail it to you
cause it seems the list doesn't accept it.

> 
> Ok, that's it for now.  I'm working on a simple sample to enable us to

> test this new beast and then should be ready to commit.

Great ;)

Regards,
Unico

> 
> Geoff
> 
> 

Mime
View raw message