cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: Adaptive Caching [was Re: initial checkin of the Scheme code]
Date Thu, 13 Dec 2001 18:48:08 GMT
Berin Loritsch wrote:
> 
> Berin Loritsch wrote:
> 
> > Stefano Mazzocchi wrote:
> >
> >> Talking about placing too many irons in the fire :)
> >>
> >
> >
> > Stefano, please forgive my ignorance, as I have an Associates Degree in
> > Music Recording, and not a traditional CS degree.  I haven't had the
> > higher math classes you and many others have had here.

Oppps, sorry about that: having had *many* of those higher math classes,
I consider this very basic stuff and I tend to forget that it is not,
even for programmers (well, the concepts indeed are, as you know).

> > I understand Algebra and Trigonometry ok, so if we can simplify these
> > algorithms into a "pseudo-code" so I can wrap my head around them, I
> > will be happy to jump in the discussion.  As it is now, It's going
> > over my head.

Thanks to Leo for doing that.

> > I understand the Calculus ideas of functions (i.e. V(r,t)).  What throws
> > me are the Sigmas (I studied Ancient Greek, so I know what the letters
> > are).
> > I know they are meant to represent sums, or statistical data.  I don't know
> > what the letter above the sigma or below the sigma is supposed to signify.

it's where to start and where to stop. In fact, math, after centuries of
use, has an incredibly good syntax (unfortunately, it requires full 2D
vector graphics to show and doesn't render very good as ASCII art :(

> > If you don't feel up to giving me a quick math lesson so I can grok the
> > concepts fully, can you point me to a resource where I can teach myself?

Oh, no problem in explaining what this means to everyone that asks.

> >> Thus, given
> >>
> >>  r        := resource
> >>  t        := time of the request
> >>  V(r,t)   := validity estimation cost
> >>  P(r,t)   := production cost when caching is not performed
> >>  Pc(r,t)  := production cost when caching is performed
> >>  l(r,t)   := lookup cost
> >>  s(r,t)   := storage cost
> >>
> >> and
> >>
> >>  I := sequence of invalid requests time [1,n]
> >>  V := sequence of valid request time [1,m]
> 
> Also, I have to ask which aspects of the resource are we calculating?  Time
> changed? size in bytes?

That's not defined at this point, we would say "pluggable" or "abstract"
in code terms.

These equations are sort of a math framework where you can plug in your
own cost function and the whole thing still works.

[I admit it's probably because of these *intensive* high math studies of
mine that I have such a framework-oriented mindset]

> t = relative millis since begining of adaptation, time since last request, or absolute
millis?

Again, not defined. It's the time axis. It's up to you to define what
the resolution is and where the zero of the axis is. It's not the
equation's concern to define that (at least, at this point).

> Also, what is the sample window that we have to look at.  It is impractical
> to look at all samples for the life of the cache, so how many do we need to track?

I didn't define this as well, anyway I expect *not* to be impractical to
consider all the samples, as long as the statistical informations are
stored (average, standard deviation) instead of the entire array of
values.
 
> It also helps to know the units in which these functions are measured, such as
> Validity Estimation Cost (my assumption is time in millis).

The dimension for a function of cost is [cost]. :)

If you define what [cost] is at this point, you are throwing away the
ability to plug in different cost functions.

For example: in a system where you have all the disk you want, disk cost
is zero, thus you don't take it into account, only CPU time is your
scarce resource, thus this is included into the cost function.

On a hypothetical embedded system, the scarce resource could be 'thermal
dissipation' while disk and CPU time might be less important.

How, can you, by hand, estimate what resources will be better to cache
or not depending on how much 'heat' they generate inside the system?

This is the power of my adaptive caching scheme with pluggable cost
function: it's up to you to define what you think it's scarce on your
system and to provide a way to measure that scarce resource, the caching
behavior is handled by the same exact algorithms.

This also shows how limiting is the assumption that cost's dimension is
time, only because you are used to think that the only scarce resource
on a server system is processing speed.
 
> Also not that storage and lookup costs can be reduced with asynchronous file IO.
> It is not simple to program in those terms, but such a mechanism can speed up the
> scalability of such operations.

Absolutely, but once you have introduced a way for the system to
'measure' those costs, any improvement on the architecture will be
balanced by an adaptation of the cache system, without even telling it
that you changed something.

> You also have to consider the fact that an effective cache uses both memory and
> file IO, so the storage costs between them vary greatly.

Absolutely, this is why the cost function is not defined and the math
framework is kept abstract.

Hope this helps.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message