cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Berin Loritsch <blorit...@apache.org>
Subject Re: [RT] Adaptive Caching
Date Tue, 15 Jul 2003 22:06:04 GMT
Stefano Mazzocchi wrote:

> 
> On Monday, Jul 14, 2003, at 11:29 America/Guayaquil, Berin Loritsch wrote:
> 
>> We would have to apply a set of rules that make sense in this instance:
>>
>> * If resource is already cached, use cached resource.
>> * If current system load is too great, extend ergodic period.
>> * If production time less than serialization time, never cache.
>> * If last requested time is older than ergodic period, purge entry.
>> * If memory utilization too high, purge oldest entries.
> 
> 
> here is where I start to disagree. The above rules mix concerns like 
> crazy and some of them are plain wrong.

The purpose of these rules was not to say that they should be followed,
but to expand your mind and open it to the realm of rule-based programming.

RBP, the core of intelligent agent programming takes facts that are available
to it, and based on the precedence of rules makes a decision.  That decision
causes an action which will change the state of the system.

The Intelligent Agent is the epitome of adaptive behavior, and the interaction
of a few simple rules can quite impressively give the illusion of a smart
machine.  Given enough time and proper resources it can optimize the cache
to the point where the best set of items for the time of day is actually
pre-loaded giving the illusion of even greater intelligence.

In essense the responsiveness of the cache control agent is directly
proportional to the search algorithm used to make the decisions.

> 
> if you start to think in terms of "costs" and not "time", you'll start 
> seeing how much of your cache assumptions are very context specific and 
> not general at all.

This one set of rules optimized based on time--however a different set of
rules would optimize on memory consumption or temporary storage consumption.

In essence, we can experiment with the set of rules that help create the
best mix for the particular deployment.

<snip type="stuff that I don't want to argue about">

>> As to the good enough vs. perfect issue, caching partial pipelines (i.e.
>> the results of a generator, each transformer, and the final result) will
>> prove to be an inadequate way to improve system performance.
> 
> 
> Pfff. Stating this as a general truth is simply silly.
> 
> Maybe it's hard to achieve, true, but saying that it doesn't improve 
> performance is definately wrong. You are basically stating a overall 
> general cost properties of resources, without even knowing what 
> resources you are talking about.

All I am saying is that it has more value on paper than it does in real
life.

> 
>> Here are the reasons why:
>>
>> * We do not have accurate enough tools to determine the cost of any 
>> particular
>>   component in the pipeline.
> 
> 
> And you don't need to do it. You just sample the cost of the entire 
> resource. Then let the system try different assembling (evaluating the 
> final resource creation costs) and at that point, you have enough 
> information on what's the best stragety (but you keep updating the 
> information).
> 
> God, it's all described in my design, why people don't read what I write 
> if I put math in it? :-((((

You wrote 18 pages worth of stuff.  How detailed a look do you expect?
Keep in mind that folks like me have an associates in recording music,
yet are damn good software engineers.  I haven't had the math classes
to fully understand your model--and I imagine many other Cocooners are
in the same boat as I am.

At least I am putting forth an effort.

>>
>> * The resource requirements for storing the results of partial 
>> pipelines will
>>   outweigh the benefit for using them.
> 
> 
> Again, how in hell do you know? you are assuming global properties. 
> Maybe it's the case for 80% of the resources, but it's exactly that 20% 
> that I have no idea how to optimize and I want the cache to adapt.

I know because I am omniscient.  Silly question ;P

> 
>>  Communication heavy processes such as database communication
>>   and serialization throttle the system more than any other production 
>> cost.
> 
> 
> but maybe not. let the system adapt, don't make the guess for yourself.

I dunno.  I have observed this several times.  In Java's blocking IO
nature (which Servlet technology assumes) there comes a point where the
task switching overhead of handling new connections (i.e. more threads)
is the major cost of a system.  However, the poor Servlet writer and
consequently Cocooner has no way of controlling this function.

The only way that Cocoon *could* shed load in such a situation is to
simply not process the request and drop the person.

> 
> it sounds like magic, I know, but read the math, it's all there.

I need a magic math translator.

> 
>>
>> For this reason, providing a generic cache that works on whole 
>> resources is
>> a much more efficient use of time.
> 
> 
> you based assumptions over assumptions and you come up with a general 
> answer? c'mon.

Remember, I'm omniscient <g>

> 
>> For example, it would make my site run
>> much more efficiently if I could use a cache for my database bound 
>> objects
>> instead of creating a call to the database to re-read the same 
>> information
>> over and over.
> 
> 
> hey, careful, an adaptive cache will never be smarter than you. Example: 
> if you make an database connection to understand if the database data 
> has changed, the cost difference between caching and not caching will 
> very likely be small, if not negative.
> 
> If you use a event driven, control inverted approach and the database 
> calls you to signal the changes, the cache will be much more efficient.
> 
> my cache design optimizes what's there, it doesn't write code for you.

Understood, but if the interface or the component allows you to hook
in persistence mechanisms for the type of resource, it can be made
to adapt to other things which will help lower the overall cost to the
server.  If I write part of the code, the cache will take care of the
rest.

> 
> The cache system is given with:
> 
>  1) a cost function
>  2) a pipeline setup
>  3) a set of components
> 
> this creates a overall cost function of the 'operation' of this system.
> 
> by running the system, using requests as cost sampling and cost analysis 
> as a way to drive the sampling, the cache system converges to a 
> time-local minimization of the overall cost of running the system.
> 
> I can also provide you with numbers on how efficient the minimization 
> was and give you insights on those resources where the caching is 
> inefficient (you can call them hotspots, if you want) or provide you 
> strategies on cost effectiveness (for example, it can forecast the 
> minimization effectiveness as a function of memory or, for example, disk 
> speed)
> 
> With *this* information, gathered on a running system, you can tune:
> 
>  1) your cost function
>  2) the physical resources that enter your cost function (cpu speed, 
> memory, disk speed, network speed)
>  3) the caching logic of the hotspots
> 
> Of course, the cache should not do the tuning itself, but merely trying 
> to do its best with what you give to it. Then give you numbers that 
> allow you to further tune the system.

I understand the general idea--however, considering my ignorance in hard
mathematics, I need it broken down for me on how a layperson can know
how to optimize for the resource in question.

The first thing most people tackle is speed--then the sys admins come
after them saying that it is using too much memory.  The sys admin will
come up with an acceptible top value--but how does the layperson plug
that information in when it is time to deploy?

You see with a rule based system, you can express those concerns fairly
easily and watch the behavior adapt.  With a pure mathematical function
it takes a genious (or someone with a doctorate) to understand how to
tune the system without recompiling.

We need to be able to plug in hard parameters and then let the cache
do its thing.

> 
> Please, don't reply without having understood what I wrote and if there 
> is something unclear, please ask, don't just assume.
> 
> TIA
> 
> -- 
> Stefano, a little depressed by the fact that what he considers the best 
> idea of his entire life is not even barely understood :-(

Not all of us are super-geniouses.  Dumbing it down a lot will help
emensely.  The problem is that it is already 18+ pages of hard to
read stuff.  Then when we don't understand it we get verbally flogged.
Sorry I poopooed on your idea.  I'm sorry I don't have the mental capacity
to constructively contribute.

Truth be told, there are bigger fish to fry, and I will concentrate on
those.  Have fun with the cache.

--
Berin, a little purturbed by assumptions on assumptions about the
accademic level of understanding of the fellow contributors. :(

-- 

"They that give up essential liberty to obtain a little temporary safety
  deserve neither liberty nor safety."
                 - Benjamin Franklin


Mime
View raw message