cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Adaptive Caching [was Re: initial checkin of the Scheme code]
Date Wed, 12 Dec 2001 18:17:12 GMT
Talking about placing too many irons in the fire :)

Paulo Gaspar wrote:

> The "adaptive caching" idea just arrived too early but I hope it is not
> forgotten.

You can bet your ass it's not :) [excuse my oxford english]

The concept I proposed (more than 6 months ago) about adaptive caching
is difficult to understand because it's very different from the usual
caching approach.

Let me try to explain it:

when a request arrives to a resource (might also be a pipeline
fragment), you have two choices: ask the cache if the resource is still
valid or go right ahead and regenerate it.

And then, after it's regenerated, should I save it for further use? if
so, where? disk or memory? and if the memory is full, what should I
throw away?

There is a very simple answer for all of this: do what it gives you the
best performance.

Period.

The real problem is: once I've taken a decision (cache or don't cache)
how do I know what would have happened performance-wise if I had taken
the other choice?

This is the key problem.

I propose a simple solution: add 'selection noise' and use incoming
requests as sampling events on the system to test.

It's a trick: the objective is to reduce the time it takes to handle
those requests, but I use them to obtain time information on the system
and I superimpose a stocastic sampling to the decision-making process.

The problem is that sampling uses user requests, so we must reduce the
'noisy' behavior of these requests: my solution is to make this
'selection-noise' a function of the difference between the two paths.
So, if going thru the cache system is, on average, 3 times as fast, I
use 1 request over 100. Otherwise, if the cache yields 20% improvement
(1.2 times as fast), I sample 1 over 5 and so on.

This guarantees that:

 1) users don't perceive this since the faster is one path, the less
frequent the other path is sampled.

 2) the system remains stable and adaptive: if sampling the other path
reduces the difference, the frequency of sampling increases, thus
ensuring a way to 'steer' the decision making following the actual
system performance.

Sampling sacrifices a small peak performance for adaptibility, but
allows the cache system to transparently adapt even to those cases where
caching makes it "slower" than regenerating the resource and avoiding
storing it into the cache system (which also has a cost).

Another almost magic effect of this system is that we have a direct
measure of the efficency of the cache: assuming time-locality of
actions, I have a way to measure directly the amount of CPU time the
cache system saved.

How so?

Everytime a request comes, I have the information on the 'estimated'
time the resource will need to be generated on the different routes.
Once the decision is taken, I have the information on how much it took
to get it and I can compare it with the "assumed" time that would have
taken on the other path. Then I know how much time I *saved* with this
choice.

But we don't stop here: we also have a way to measure the efficiency of
the cache itself between cache hits and cache misses.

This information might be used to estimate the RAM a server would
require to obtain near-maximum efficiency.

And all this without even touching a single configuration!!!

A wonderful example of the advancement of adaptivity on caching is that
you have an immediate feedback (with numbers!!!) on how much a change,
say, in your database monitoring code, influences caching efficiency.

This is a completely innovative approach because the decision whether or
not to cache something is estimated a-priori by system administrators,
but in complex systems, very few people can make this decision and tune
it for every change in the system.

Consider this as a hotspot caching machine.

Find attached my original message (WARNING: it's more than 20 printed
pages!), maybe this time more people will understand it. :)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
Mime
View raw message