cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans Ulrich Niedermann <>
Subject Re: [Cocoon Devel]Is Cocoon2 caching implemented?
Date Wed, 16 Aug 2000 02:27:08 GMT
Brian May <> writes:

> >>>>> "Hans" == Hans Ulrich Niedermann <>
>     Hans> [ XSLT-specific caching internals/proposals deleted ]
>     >> What about adding behaviours like
>     >> 
>     >> getLastModified() getWhenExpires()
>     >> 
>     >> to each pipeline component, that returns a null by default,
>     >> then have the sitemap work out max() by calling each in turn?
> Not sure I like the max() part. 
> I think it might be better to parse the caching details along each
> pipeline component (in a similar way I presume that the data is passed
> along), so each component can inspect the details from the previous
> component, and modify the details appropriately. So a component could
> say that the expiry date = prev expiry date - 10%, for instance.
> Of course, I haven't seen the sitemap code, so I am not sure about how
> this would be implemented.

I haven't seen any code yet either.


Hmm. Let's consider one component C. The data C delivers is
a function F of
(a) data r from the _r_equest, i.e. the type of request, the URI, 
    session IDs, POST data etc.
(b) the current state s of a subset of the universe
(c) data the _p_revious component in the pipeline delivers 
    This is a function P(r,s)

So we end up with a chain of components that implement a function
F((r,s),P(r,s)) and process data consecutively from d[n] into
d[n+1]=F(r,s,d[n]) where n is a finite number and d[0] is the empty

The sitemap (which somehow has to set up the cache engine) only knows
about each component 
(i)   the component itself (which knows details about F)
(ii)  the previous component (which produces our input data)
(iii) the request data r



Gut feeling says that expiry time and last modified time should
somehow be glued to the data on its journey along the pipeline, as
these times may vary between different requests. 

So at a certain component C[n], the data d[n] is processed into
d[n+1]=F(r,s,d[n]) and the sitemap/cache engine has to find out
if it should cache d[n+1] for future request. There are two

a) C[n] doesn't produce anything cachable at all (e.g. it includes a
   random transaction number). Then the data should be passed on
   without even consulting the cache with the expiry date set to
b) C[n] produces data that should sometimes be cached.

I will now assume for method prototypes that the request r is
represented by an instance of the Environment class.

In case b) the 
* expiry date of d[n+1] should be set to the earlier one of 
  - the expiry date of d[n]   and
  - some value derived from (r,s) depending on F
    implemented by C[n].getExpiryTime(Environment e)
* last modified date of d[n+1] should be set to the later one of
  - the lastmod date of d[n]  and
  - some value derived from (r,s) depending on F
    implemented by C[n].getLastModifiedTime(Environment e)

The cache engine would have to use some hash value H(r) as an index
for the caching of d[n+1]. If this H(r) is also spit out by the
component glued to d[n+1], this would allow caching in connection with
components which determine whether to behave like a) or b) according
to a _subset_ of r. The component just has to choose the proper subset
of r to calculate the hash value from. This leads to some
Component.getHashFromEnvironment(Environment e) method.

Perhaps the two dates and the hash value should be included into the
data objects that are passed on along the pipeline. This would enable
each component to re-use the data it calculates anyway for d[n] for
the calculation of these three values as well.



Please note that the usage of the words "date" and "time" is not
consistent. Most times, I mean something like a "time stamp".

It could be that a component changes between a) and b) depending on a
_subset_ of r. How could that be handled?

What have I overlooked?


>     Hans> Sounds good to me. I've thought about caching during the
>     Hans> last few weeks (without having a look into existing caching
>     Hans> code) and came up with a similar method but that didn't go
>     Hans> that far.
>     Hans> However, the getLastModified() and getWhenExpires() methods
>     Hans> probably have to know about request parameters (URI params,
>     Hans> Post stuff, cookies, sessions etc.) to determine if the
>     Hans> output data has changed.
> Thats an interesting idea that takes it beyond what I was thinking of.
> This should allow very fine tuning of how long a page can be cached
> for. The more I here about C2, the more I like it ;-)

I'm still not sure if this idea doesn't lead to FS.

> Some things to consider: some pages don't need to expire. I guess you
> can just give them a very advanced expire date. However, are there any
> pages that should never be cached? I can't think of any off hand...
> Another thing to think about: how do you set the expiry time for
> static files. Perhaps expire date = lastmodified + config value.
> As, IMHO, this depends on the largely on the administrators of the
> site, and how often they plan to make changes.

I wouldn't take the idea of an expiry time as a long integer very
seriously. Of course, there also has to be a way to specify an expiry
time of "never" and "undefined". 

Similarly, for the time of last modification, "undefined" and "Big
Bang" should be possible values?

>     >> This makes more sense to me because the idea of modification
>     >> time belongs on the component (be it generator, filter or
>     >> serializer) and not at the sitemap level...
>     >> 
>     >> For example, suppose there is a filter that uses some
>     >> time-based criteria to change the way it generates a file
>     >> (maybe black bg for evening, yellow for daytime). No files
>     >> change, but the last-modified *does* change.
> Agreed.
> Also, it would be really good (for some broken applications) if the
> pipeline stage can pass the caching details on without modification.
> This is one problem I have had with Apache. My university has a
> (stupid) policy that all personal web pages needs to be processed by a
> CGI script that appends a legal disclaimer to bottom. However, this
> means that the file can't be cached. Arggh!

BTW: What university is this?

> [ however, I must admit this raises other issues that don't appear to
> be a priority (yet?), eg serving personal pages with Cocoon ]

What is special with personal pages?

>     Hans> But this suggests adding a third method to all pipeline
>     Hans> components that tells the cache engine if caching results
>     Hans> makes any sense at all (imagine a component that outputs the
>     Hans> current time).
> This depends on how accurate the time needs to be. If you had a web
> page designed for synchronising your watch to the second, then maybe
> this might be an issue...

I admit I should better have written about a timestamp with picosecond


View raw message