cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuart Roebuck <>
Subject Re: AW: [C2]: Proposal for caching
Date Fri, 26 Jan 2001 10:54:11 GMT

On Friday, January 26, 2001, at 09:23 AM, Carsten Ziegeler wrote:

> > Stuart Roebuck wrote: 
> >  
> >  
> > This is probably impossible, but I'll suggest it anyway.  Let's  
> > imagine that every component has a validator method.  When each  
> > match pipeline is being validated (to see whether to use the  
> > cache or not), the validation works backwards up the pipeline.   
> > So for a pipeline like: 
> >  
> >     <map:match="test">  
> >       <map:generate src="http://myserver/resource.xml" />  
> >       <map:translate src="local.xslt" /> 
> > 	 <map:translate src="xsltwithxsp.xslt" /> 
> > 	 <map:serialize type="html" /> 
> >     </map:match>  
> >  
> > The serializer is asked to validate.  Its inbuilt validate method  
> > somehow responds saying, "I've not changed if the previous  
> > component hasn't changed". 
> > The XSLT translator (which has some external inputs) says, "I've  
> > changed regardless of my input". 
> > The next translator up says "I've not changed unless the previous  
> > component has changed". 
> > The generator says "I've not changed." 
> > So the cached result of the first translator is fed into the  
> > pipeline and processing continues as normal from there. 
> >  
> I am not quite sure, but why to you need the reverse order? Starting 
> with the generator saying "I'Ve not changed", going to the translator 
> saying "Not changed regardless of my input" and then the xsl translator 
> says "I've changed". No the result of the first translator (= the last 
> component which says: "Not changed") is used. 
> A component in the pipeline is only asked if the all previous components 
> have said: "Not changed". 
> This should create the same result, or am I wrong? 

Yes, it's probably a lot easier to implement as well!

> > Now, if we want to add in some specialist caching behaviour we  
> > have an option of adding in an optional 'cache' component like this: 
> >  
> >     <map:match="test">  
> >       <map:generate src="http://myserver/resource.xml" />  
> >       <map:translate src="local.xslt" /> 
> > 	 <map:translate src="xsltwithxsp.xslt" /> 
> > 	 <map:cache> 
> > 	  <map:parameter name="maxUpdateFrequency" value="6 hours" /> 
> > 	 </map:cache> 
> > 	 <map:serialize type="html" /> 
> >     </map:match>  
> >  
> > Now, the same process as before takes place, but when the 'cache'  
> > component is asked to validate, it will normally just say, "I've  
> > not changed". In this example, once every 6 hours it will say,  
> > "I've changed if my input has changed".  If it says, "I've not  
> > changed" then the cached value of its last result is used without  
> > even having to call the validators further up the pipeline. 
> > 
> The same applies here: Why not starting with the generator and going 
> the usual order of the pipeline`? 

Yeap, again.  Good point!

> > If this idea was possible it would provide a very simple  
> > framework:  all components have validators;  specialist  
> > validators exist as stand-alone cache components; the actual  
> > caching is carried out automatically. 
> >  
> I like this idea of the special cache components a lot as it would make 
> the sitemap design easier. But the main problem I see in this case is the 
> generator: 
> The FileGenerator can read a local file from harddisc and it can get XML 
> over http from another server. For a local file the generator can easily  
> detect if the file has changed by looking at the last modification date. 
> For external XML it is in many cases not possible to test if the source 
> has changed. So for such cases the special cache component is very useful. 
> The FileGenerator would say: "Has not changed" and the special cache 
> component would say: "Content changes every 6 hours". 
> But how does the FileGenerator know that he should say "Has not changed"? 
> I see four possibilites: 
> 1. The FileGenerator has its own logic, saying: I can test local resources 
>    myself, external resources do never change. 
> 2. Same as above, but external resources change always 
> 3. Configuration of the file generator: 
>       <map:generate src="http://myserver/resource.xml"> 
> 		<map:parameter name="cache-has-changed" value="always"/> <!-- or never -->

> 4. The special cache component is tied to a pipeline component: 
>        <map:generate src="http://myserver/resource.xml">  
>  	 <map:cache> 
>  	  <map:parameter name="maxUpdateFrequency" value="6 hours" /> 
>  	 </map:cache> 
> 	</map:generate> 
>    If a component has a special cache component that would do the testing, if not, 
>    the component tries it itself, by using 1. or 2. 
> Personally I like 4. the most, as it is very close to the validator concept I posted

> yesterday. 
> Perhaps this is again FS, but it is very straight forward and easy to implement. 

I can see that FileGenerator creates some difficulties because:

1. You can only check if it has changed by checking it, and the process of connecting and
checking is going to be the bottleneck that the check is there to avoid.

2. It may not be possible to determine whether the resource has changed other than doing a
byte comparison with the previous version.

However, I'm not sure that there is any need for these difficulties to manifest themselves
on the sitemap.  I think they should remain 'difficulties' for the developer of the FileGenerator
validator method.

Firstly, if in doubt I think all sitemap components should respond to the validator method
call with a "I think I've changed" response.  This may sometimes lead to inefficiency but
it ensures that the output is always 'as expected'.  With the model we are talking about,
the actual caching mechanism is hidden, but we could potentially have general caching controls
which would allow for things like an across the board 10 second cache delay - ie. further
requests for the same page would come from cache (regardless of validation) for up to 10 seconds
after the last generated response.

Having said that, and thinking on my feet, this leads to a problem...  How do you distinguish
between unique page requests?  It is not enough to use the HTTP request URL to distinguish
unique page requests, because we may be delivering different pages based on the browser type,
session information, cookies, etc.  *But*, there is distinction, worth making, between "is
this request cached" and "is this cached request up-to-date".  So, I think components need
to be able to respond to three important requests in the pipeline:

	give me your output (here's the input)
	give me a unique cache key (here's the input)
	have you changed (here's the input)

The first request is what components are doing already.

The third request is what we've been talking about - the component is being asked to indicate
whether or not its output has changed given the input (and any other inputs it has internally).

The second request is the new one.  This is asking the component to generate a unique key
that identifies a unique set of inputs (including internal inputs) but ignoring time.  In
other words, the response to this is effectively, "Here is a unique key which can be used
to lookup my output in a cache, this key guarantees that you will get the right item out of
the cache, but I'm not guaranteeing that the item is up to date".

So the fileGenerator component would always return the same key for the same URL, but (in
the absence of any more sophisticated logic) would always return "I think I've changed" to
the validation call.  (I say in the absence of more sophisticated logic, because there is
no reason why fileGenerator couldn't utilise the cache to compare the remote resource with
the last cached version and accurately determine whether the remote resource had changed.
 Whilst this wouldn't remove the bottleneck of requesting the remote file, it might reduce
CPU overhead if it removed the need to carry out further processing further down the pipeline.)


With these three methods in the component API, it would be possible to implement a caching
mechanism that could be set up with an across the board delay (as I suggest above, before
I realised the issues!).  Thus, I think we could avoid having to worry about FileGenerator
as a special case.


Stuart Roebuck                        
Lead Developer                               Java, XML, MacOS X, XP, etc.
View raw message