cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuart Roebuck <>
Subject Re: idle thoughts in caching in c2
Date Tue, 23 Jan 2001 13:22:18 GMT

On Monday, January 22, 2001, at 11:58 PM, Paul Russell wrote:

> * Giacomo Pati ( wrote : 
> > Paul Russell wrote: 
> > > * Sergio Carvalho ( wrote : 
> > > > On Fri, 19 Jan 2001 13:44:42 -0500 (EST) 
> > > > Donald Ball <> wrote: 
> > > > > do you mean that the caching components should be explicitly put
in the 
> > > > > pipeline in the sitemap: 
> > > > > <map:cache type="lru"/> 
> > > > Yes. A Cache interface could be defined, so that classes implementing

> > > > caching are identified. Then, these are placed in the pipeline, 
> > > > between each data-producer (or processor) and the subsequent 
> > > > data-consumer. 
> > > That makes sense.  
> > Are you sure? Is a sitemap maintainer skilled to specify where to cache 
> > intermediate pipeline results? I doubt. 
> If a sitemap manager isn't, who is? The sitemap managers' job is very 
> definately a skilled position. 
> I am more than happy that the sitemap should decide on caching policies 
> when we can either: 
>  * get to the point that either we can write sensible heuristics 
>    for this; 
>  * write sensible learning algorithms to let the system decide on 
>    the parameters for caching. 
> However, I think we should give sitemap managers the option to override 
> these decisions, in case we (or the programs we write) get it wrong. I'd 
> be interested in how other people visualise this? Should we make it 
> *totally* explicit (I'm -0 on this -- I'd rather we let the engine 
> 'play'), should we make it totally implicit (and risk sub-optimal 
> performance)? Anything else? 

I feel uncomfortable about making optimisation explicit, but I can see the arguments both
ways.  From a pragmatic point of view, we need caching, and if we can't do this well automatically,
we have to make it explicit.

Can I suggest that:

1. Like the other sitemap features, there are defaults built in so that caching is automatic
- even if the automatic choices are not always the best.

2. Explicit caching components are treated as 'recommendations' not as a guarantee.  This
makes it a lot easier if automated caching techniques begin to become better judges than their
human counterparts.

With regard to placing caching components between processors and consumers.  What does this
mean in practice?  As far as I see it, we have generators, translators and serializers.  Aggregators
are a special case of generator.  If we could assume (or even stipulate) that the behaviour
of translators and serializers should be entirely determined by their input, then it would
make sense that caching of data always happens at the end of a match pipeline and that caching
algorithms used automatically are determined by the type of generator and match.  The API
of a generator could be extended to allow for an optional hash generation method which would
be required to generate a hashcode based upon any given generator input.  Simple generators
that know that their output is purely dependent upon an input file could generate the hash
code based on the file name and date.  More complex generators might also base it on inputs
like time.  By default all generators would generate hashcodes based upon their actual output

This once again raises the issue of 'resources' being discussed in the other thread:

> * Giacomo Pati ( wrote : 
> > Stuart Roebuck wrote: 
> > >      <map:match pattern="result2.xml"> 
> > >                 <map:generate type="resource" src="myResource" /> 
> > >                 <map:translate src="stylesheetY.xslt" /> 
> > >         <map:serialize type="xml" /> 
> > >      </map:match> 
> >  
> > No, this is not the way to use resources. Resources are final resources 
> > you can use them as is but you cannot process them further on. It was 
> > never meant to. 
> Troubles is, this is exactly what we're talking about doing if we use 
> resources as components in an aggregation, which seems to be what people 
> are suggesting they want. I can accept that maybe we don't let you refer 
> to non-serialized resources elsewhere in the sitemap, but we will be 
> permitting further manipulation after the aggregation happens. To me, 
> being able to perform further manipulation of a non-serialized pipelines 
> sounds like 'fair use', and at times appears to make the sitemap 
> clearer. I'm not suggesting we aim to implement this immediately, but it 
> may be worth considering going forward, since it seems I'm going to end 
> up implementing a fair bit of it anyway. 

If resources could be further processed, it would encourage sitemap managers to identify processing
that is being used repeatedly and provide a mechanism upon which to automated caching.  I
can assure you that in my real-life use of Cocoon 2 I have *lots* of situations where there
is a need for repeat processing and I end up resorting to constructs like:

	<map:generate src="http://localhost:8080/myContext/subprocessing.xml" />

and every time I do it I say to myself, "there must be a better way..."!

Similarly, my real life uses of Cocoon 2 are littered with repeat sequences of XSLT processing,
such that I am sure there would be great potential if some superhuman individual could work
out a way of distilling multiple XSLT sheets processed in serial into a single sheet which
could be pre-compiled.  But that's definately a diversion.

So, these are just a few thoughts.  I've not been following things closely recently, so I
could be way of the mark and grossly over-simplifying things.


Stuart Roebuck                        
Lead Developer                               Java, XML, MacOS X, XP, etc.
View raw message