cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <>
Subject RE: Caching question
Date Sat, 14 Jan 2006 11:45:15 GMT
Perhaps the subject should be: Caching problems with cocoon. 
You do want the very exact same thing I have been looking for for a few days. A very basic
example below shows the problem:

<map:match pattern="getcontent">
	<map:generate src="test.xml"/>
<map:match pattern="index">
	<map:aggregate element="root">
		<map:part element="part1" src="cocoon:/getcontent"/>
		<map:part element="part2" src="cocoon:/getcontent"/>
		<map:part element="part50" src="cocoon:/getcontent"/>
<map:match pattern="index">
	<map:aggregate element="root">
		<map:part element="part1" src="test.xml"/>
		<map:part element="part2" src="test.xml"/>
		<map:part element="part50" src="test.xml"/>
test.xml = <foo/>

If you try these to setups for expires,caching,ecaching pipelines, then in the first run,
both examples will be equally fast. BUT, after it is cached, the second example will give
results back much quicker (depending on the number of parts, but with 50, count a factor 10!)

Why is is because the aggregator in the first example has to find it child pipelines
cachekeys, therefor instantiating 

<map:match pattern="getcontent">
	<map:generate src="test.xml"/>

50 times, to get the cache-key that has to be checked for validty! I think what is most time
consuming is instantiating the pipeline. I did try to generate the test.xml with my own generator
which returned a NOPvalidity, resulting a some time gain (about 20% in this specific example),
but not enough. So, I concluded that I don't think the lookup of the cache-key take very long,
but Cocoon figuring out which cache key to lookup takes long. 

We build complex sites, using dynamic location maps, which result from several .xml files
and transformations. The locationmaps are for example used 5 times for each request. Now,
just a cached dynamic locationmap won't do, since looking up all depending child pipelines
and finding out keys, are way expensive!! You can gain very much time, by making sure that
not to many pipelines are called for critical things like a dynamic location map. 

Then of course, we can just write the output to filesystem, do 1 generate, and it is cached
(with one cache key, so it is fast!)

What I would really like, is much smarter caching, involving smart invalidation, like, the
dasl transformer in a child pipeline is invalidated by a JMS event, making sure that it's
entire pipeline is invalidated, causing its parent pipeline to invalidate. Now, an aggregator
for example, knows all by itself, if it is valid or not. The current caching is based on a
very expensive cache key lookup. 

Well, of course, things will get very complex for map:acts, map:selects, sessions stored in
cache-keys etc, but if we would only focus on: event based invalidation of the cache, and
cache keys based on only request-parameters, the pipeline and cocoon-parameters, it should
be possible. 

For example, <map:pipeline type="graphcache"> where the graphcache implies that it keeps
track of dependencies. That also means, that changing a file on filesystem is not an event,
so it won't be invalidated. Is that a problem, well, not if a site is deployed....after deployment,
the filegenerator does not have to check the validity of a file: I know it is valid.

I know it possibly won't fit in cocoon easy, but making complexer sites and wanting to exploit
the caching in cocoon effectively, there has to be some smarter caching invalidation. 

What I found the weirdest of all, was that for an expiring pipeling, you can actually set
the expire time and the exact cache key it should use, so it can be unambigously found when
the pipeline is called again, that it still checks all child pipeline keys (which again can
be very expensive). 

These were just my two cents on cocoon caching...

Regards Ard

> I'm still a little unclear about how SourceValidities and non-caching 
> pipelines work.  The way I believe it works is that non-caching 
> pipelines are always considered to be invalid and their content is 
> recreated.  I believe then that if a parent pipeline 
> aggregates (by any 
> of the aggregation techniques) one or more non-caching child 
> pipelines 
> that the parent would always have to regenerate its content.
> What I would like is somewhat different.
> When aggregating content both the caching pipeline and the parent 
> pipeline will have the content, albeit in different forms.  
> What I would 
> like is for the parent pipeline to know that the content it 
> has is still 
> valid so that it can return the aggregated content but if the 
> aggregated 
> content needs to be reconstructed then the child pipeline 
> will have to 
> recreate it.
> Do I have a misunderstanding of how this works?

> Ralph

View raw message