cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Berin Loritsch <blorit...@apache.org>
Subject Re: [RT] Adaptive Caching
Date Wed, 16 Jul 2003 19:40:58 GMT
Stefano Mazzocchi wrote:
> 
>>
>> What you left as an excersize to the reader was the Cost function.
> 
> 
> See my email to Marc for more insights.

And my response to that.  I think I am starting to "get it".  Having
an adaptive cost function with an adaptive cache can efficiently
protect against absolute constraints and promote the primary concern.
In the end we might find that the memory/disk space consumption is
our most precious resources--esp. if we have absolute limits imposed
on us from outside.  Time would always expand to the remaining proportion.

> 
>> The
>> rules-based approach is one way of achieving that cost function 
>> declaratively.
> 
> 
> True. But I prefer a purely statistical approach because I'm lazy and I 
> don't want write those rules: I want to the system to find them out for 
> me. ;-)

I am most concerned with rules imposed from outside.  For example, some
hosting environments will have sys admins on your back if you have runtime
requirements that exceed their perceptions of optimal resource use.  In those
cases we don't have the luxury of a purely adaptive cache.

>> Explicit rules are easier to
>> debug and understand, also to predict.
> 
> Here I disagree.

Currently we have a very simple caching rule:

If memory consumption goes higher than a particular threshold, start purging
the cache.  This in turn increases the cache miss ratio, but as the purging
is smart enough to get rid of old entries and not recently accessed items.

Anyhoo, this is neither here nor there.  I am beginning to see how we can
impose close to absolute limits with a simple weighted cost function.

> 
>> Truth be told, your search algorithm might be as efficient as they come.
>> It might not.
> 
> True. It has to be demostrated. But the incredible ability of bogofilter 
> gives me hope that it is the case even for caching.

As long as we have a consistent interface for the cache controller, we can
swap out search algorithms and still use our same cost function.  That way
we can find the optimal search algorithm for the cache.  Anyhoo, this is
another discussion entirely.


>> But do keep in mind the poor administrator/installer.  The guy who is
>> managing the install is more interested in the efficiency of the cache
>> and how Cocoon uses resources than the developer.  The user in many cases
>> won't know the difference because their network connection is throttling
>> the response, or their browser is the bottleneck.  The efficiency of the
>> cache will likely have the most impact on the scalability of the server.
>> That is my OPINION, and should not be taken as gospel.  Please do not
>> shoot me for holding that opinion.
> 
> 
> Not at all. When implemented, the dynamic adaptability will be 
> configurable and you can turn it off or plug in your own cache if you 
> dislike it.
> 
> Still, keeping in mind the poor administrator/installer, I don't see how 
> a rule engine can be easier to configure than a system that doesn't need 
> anything to learn and adapting and optimizing resources as it works. by 
> itself and with no need for supervision.

The big thing is that we need a way for the administrator/installer to set
hard limits and/or the order of precedence for the cost function parameters.
That way they don't have to experiment with the actual weighting values,
and they will be just as adaptive as the search algorithm.

>> I took the 18 pages home and started to play with some code.  I 
>> appreciate
>> the journey to get to the search algorithm you proposed.  The functions
>> as declared in the beginning were not that efficient, and on my machine
>> as soon as you had more than ~410 samples for the invalid/valid cost
>> evaluations the function took ~10ms (the granularity of my clock) to
>> evaluate.  The evaluation used constants for all the cost figures because
>> that would be the computationally cheapest way to evaluate the efficiency
>> of those functions.
>>
>> After this exercise, to my dismay, was the better way of evaluating a
>> continual cost value.  That way instead of having a sample for each
>> request at a particular time, we only had to work with three samples.
>> A vastly improved search algorithm in terms of being computationally
>> cheap.
> 
> Hmmm, I'm very intersted on this, can you elaborate more?

I started by implementing the first algorithms for valid(r) and invalid(r),
as well as the eff(r) and the evaluation for whether it was efficient to cache
or not.  That meant I had to create the functions for Pt(r) and Ct(r).

The functions as defined by your initial algorithm used the sum notation,
which means we needed to maintain a list of samples for the resource.  I.e.
the cost functions identified were a function of the resource and time of
request.  The more samples maintained, the less efficient the cache evaluation
became.  Now, considering I have a Windows based box, the JVM reports time
with a granularity of 10ms which is not enough for truly evaluating the
performance of the algorithm.

The better example that you had in your whitepaper had three continuous cost
values (C1, C2, and C3).  C1 was the cost for direct production without caching,
C2 was the cost of using the cached value, and C3 was the cost for production
and storage using the cache.  I.e. we had P, Pc, and l continously evaluated.
These three samples were being adaptively applied to the overall eff evaluation.

I will have to play with the improved psuedo-code to provide a better idea.


> I think it's the case and today's CPU don't make it slower to use 
> floating-point math. Still, the hyperbolic tangent function was just one 
> solution. there is at least another family of simply polynomial 
> functions that exhibit the same overall properties.

:)  It depends on your processor architecture really.  Most 32-bit based
architectures like Pentium and PowerPC still lean a more strongly in the
direction of integer performance.  Most 64-bit based architectures like
Itanium, Alpha, MIPS, etc. lean more strongly in the direction of floating
point performance.  In the end, I would favor something simple like tanh()
as apposed to the complexity of a polynomial function.  It is simply
easier to read.

>> Perhaps seeing what you want in code would be the best thing.  It would
>> help solidify things for us that don't do the math or are overwhelmed by
>> a very long whitepaper and trying to derive from it how it will 
>> practically
>> make our lives better.  It is would help determine the cost/benefit ratio
>> of actually developing this thing.  As it stands, any idea that takes 18
>> pages to explain gives the impression of a very high cost.  Whether this
>> bears out in practice or not remains to be seen.
> 
> I can't tell you when, but I'll do this.
> 
> At the end, it might all be an accademic exercise that doesn't work IRL, 
> I don't know. But in case it does, it would be a very different system 
> that would provide cocoon with even more advantages over other solutions 
> (or, at least, that's my goal).

I believe I am starting to see the light.

-- 

"They that give up essential liberty to obtain a little temporary safety
  deserve neither liberty nor safety."
                 - Benjamin Franklin


Mime
View raw message