cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoff Howard" <coc...@leverageweb.com>
Subject RE: External/Event Based Cache Invalidation (somewhat long)
Date Sun, 29 Jun 2003 18:24:03 GMT
> From: Unico Hommes [mailto:Unico@hippo.nl]

> I can't believe I've missed this post. Damn.
> > Below is the larger picture I envision for a new kind of cache
> > invaliditation
...
> > depending on other factors might never come.  It seems to me more
> fitting
> > with the transient nature of events to act on them when they arrive
> and
> > then
> > discard them.
>
> That would definitely be the way to go.

Good - I'm getting closer to a system that can at least be tested using
this method.

...
> > Here are the issues the "event aware" CacheImpl would need to take
> care
> > of:
> > - during store() it gets the SourceValidity[] from the CachedResponse
> and
> > looks for instanceof EventValidity (recursively for
> AggregatedValidity).
> > - if found, it calls EventValidity.getEvent() and stores the key-event
> > mapping.
> > - expose a removeByEvent(Event e) method that can be called by the
> > specific
> > event-handling component.  This could be a jms listener (as I've
> orginally
> > envisioned it) or an http/soap based system (as in the ESI patch that
> was
> > in
> > bugzilla) or even a cocoon Action or Flow, or some combination of all
> of
> > the
> > above.

I think this (though I've changed the method name to processEvent(Event e))
is
the "contract" that needs to be exposed to the cache listening systems.
Whether
the first implementation I'm working on for how the events/cache keys are
handled
internally holds water over time remains to be seen, but the way I'm trying
to go
should leave this open to alternative implementations of the internals
without
changing this simple contract.

> > - When the key is ejected from the cache for other reasons (another
> > pipeline
> > component signalled invalid for example) I think it's necessary to at
> that
> > moment remove the event-key mapping entry.  This introduces a
> complication
> > in the data structure used to store these mappings as I mention below.
> I
> > also haven't looked into the effect of the store janitor - if it acts
> > directly
> > on the Store without going through the CacheImpl wrapper, that
> introduces
> > a
> > wrinkle.
>
> Hmm, that does seem to be the case.

Well, I've thought this through more - if you are using persistent cache,
then
the store janitor simply moves items from the Memory store to the persistent
store which should have no effect on the event tracking.  If not using the
persistent cache then there could be a problem - but I think that could be
looked
into later - I'd guess it's rare that people needing the event cache are
going
to use in-memory only caching.

> > Most of the above is accounted for - except for the data structure
> > to store the event-key mappings.  As discussed above, it needs to:
> > - allow duplicate keys (each event may uncache multiple pipelines, and
> > each
> > pipeline might be uncached by any of multiple events).  So it needs a
> Bag.
> > - allow lookup of mappings based on either event or key.  Jakarta
> Commons
> > Collections has a DoubleOrderedMap, but not a DoubleOrderedBag.
> Bummer.

Ok, I was a little muddled here on the MultiMap/Bag distinction.  At least
as interpreted by the jakarta collections stuff, we need a
DoubleOrderedMultiMap
- which still doesn't exist.  But I've got a solution nearly done on my hard
drive not yet ready to commit even to scratchpad (but soon!) that uses two
MultiMaps to create a de-facto DoubleOrderedMultiMap.  I think only time and
testing will tell if a more dedicated solution is needed.  It occurred to me
that since all we're storing is relatively light-weight Event's and
PipelineCacheKeys, keeping two synced MultiMaps may be fine.  There are some
threading issues that need to get tested though.

> > - be persistent across shutdown and startup, and needs to recover
> > gracefully
> > when the cache is missing (like when it's been manually deleted)

Currently still unimplemented.  If the double MultiMap solution turns
out OK, it may be enough to serialize this out to disk - I think every
thing should be serializable.  Although the Event's are not yet explicitly
so they are simple.

> > - be efficient

We'll see!

> > I have made an assumption so far that I'd like tested by some sharp
> minds.
> > When a removeByEvent() is received, the CacheImpl would do something
> like
> > PipelineCacheKey[] getByEvent() on its datastructure.  This would rely
> on
> > hashCode() and equals() of Event() to locate relevant events.  I think
> > this
> > works well for true "equals" type of information: like "table_name"
> and
> > "primary_key" -- if they are both equal, the event has happened.  But
> > there
> > may be some where a "greater than" or "less than" or worse yet, a kind
> of
> > wild card lookup might need to be supported.  Can that be accomodated
> by a
> > Collections sort of implementation, or does something more flexible
> need
> > to be invented? As it stands, you might implement hashCode() in a
> > way that will cause intentional collisions and rely on equals() to
> sort
> > things out.  Is that crazy?
> >
>
> I think this would be difficult and will impact performance because of
> grouping of multiple keys under the same hash code. Consider wildcard
> patterns and you'd like to invalidate **. In order for this to work, all
> keys must return the exact same hash code. The same situation occurs
> with hierarchy matching, i.e. if you want /path/to to match
> /path/to/my/source (for instance in a directory generator that relies on
> all sources under a certain context directory). In this case / matches
> everything else too.

I think that wildcard-type handling can come after the current solution
is tested.  As I mentioned above, the specific implementation can change
without affecting the basic concepts.  I've thought that an simple
light-weight database might be best too - having a table indexed both by
Event and PipelineCacheKey would do the trick.  It would also get us
persistence.  But that can come later if needed.

I'm not quite as worried about performance, because the only time these
lookups are happening is on receipt of the events - it won't affect routine
user requests.  I want to get the basic mechanism working and proven and
while leaving room for later streamlining.

> I actually experienced this last situation. What I did was to generate
> apart from the original event, events for all ancestor paths as well.

Yes, this is another way to accomplish that "wildcard" ability that could
work in some situations.

...

> >   Experiment with external cache invalidation.  An
> EventAwareCacheImpl
> > >   can't be committed yet because it relies on the latest
> sourceresolve
> > >   cvs.
>
> So let's update! I'm really eager to see this stuff and get my hands
> dirty. What are the difficulties are you experiencing with building
> sourceresolve, can I help?

Well, the problem was that I was trying to build sourceresolve from
excalibur's cvs but wanted to build fresh each of it's dependencies,
including framework - but it was looking like they are transitioning
builds to maven which was not working smoothly for me (it was trying
to download a lot of jars that I already have over a slow modem).

If you can get a clean sourceresolve dist. built, let me know and I can
commit it.  Since I just patched my local jar with the one modified class
I haven't needed this locally and haven't bothered to try again yet.

Ok, that's it for now.  I'm working on a simple sample to enable us to
test this new beast and then should be ready to commit.

Geoff


Mime
View raw message