cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Gaspar" <>
Subject RE: Adaptive Caching [was Re: initial checkin of the Scheme code]
Date Tue, 18 Dec 2001 16:06:13 GMT
Hi Stefano,

Answer inline:

> -----Original Message-----
> From: Stefano Mazzocchi []
> Sent: Monday, December 17, 2001 12:50 PM
> Paulo Gaspar wrote:
> > What I mean is that the cache could accumulate quite some wrong cost
> > measurements for long periods of time if they would happen with
> > resources having a long "cache life".
> Correct, but there is no such thing as a perfect cache. Keep this in
> mind. We are dealing with stocastical properties and what we can do is
> 'converge' toward the optimal solution for the entire system. Single
> resources will not be perfectly cached, but the entire system load will
> (if there is enough statistics available).
> So, please, try to see the entire system and not the single page.

Of course. I am NOT saying "we must have no errors", I am saying "maybe
we could have less errors".
> [...]
> > What I say is that this kind of error can happen a lot. That will
> > have a different cost depending on the characteristics of the system
> > (e.g.: longer cache lives => higher cost).
> Again, correct. In my paper I clearly wrote that the algorithms
> described work best on systems that exhibit time-local behavior, that
> is, don't exhibit high frequencies of cost variations.
> Like any retroactive control system, it cannot possibly have the same
> behavior for all frequencies. The control system I described has a
> 'low-pass' behavior: works best on slowly moving costs... the problem is
> that I couldn't come up with a way to tune the system for higher
> frequencies without creating an impossible overhead on the system :/
> But if you have a suggestion, I'm more than happy to follow it.

I am just trying to figure out if you are considering each instance of a
"document fragment" or each "family". Sorry but I still do not understand

> > What is a resource? Is it "Invoice detail view" or is it the "detail
> > view of invoice number 5678665"?
> A "resource", in this case, is every document fragment at all levels,
> the output of each pipeline component.

Yes, I understand that the Cache can work in terms of fragments at all
levels and so.

What I do not understand is: having an XML fragment produced by a given
generator (that gets an invoice from the database and "translates" it 
into XML), do you _always_ track costs per each invoice instance comming 
from that generator (one entry with the average cost of generating XML 
for invoice #12456 and another one for invoice #12122) or could you keep
ONE cost entry aggregating the cost of all invoices comming from that 
generator (only one entry with the average cost of generating XML for 
any invoice, #12456's cost added together with #12122's).

If in the case above all the invoices "produced" are very similar, the 
cache could measure their cost together, using a single sampling key for
all the invoices produced by the above generator (although each invoice 
would still have a different cache storage key, of course).

In such situations, if a Cocoon developer could _optionally_ supply a 
"cost family" key the cache could have much better cost data (more 
frequent measurements, since any invoice generated contributes to the
same measurement pool) and this cost data would take less space (same 
cost data for all the invoice instances).

Maybe this is already what you have in mind, but we are having a 
communication problem around this one.

Have fun,
Paulo Gaspar

To unsubscribe, e-mail:
For additional commands, email:

View raw message