cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [RT] Adaptive Caching
Date Wed, 16 Jul 2003 18:59:25 GMT

On Wednesday, Jul 16, 2003, at 07:38 America/Guayaquil, Berin Loritsch 
wrote:

> My
> statements about the limited value (cost/benefit ratio) of partial 
> pipeline
> caching has to do with *my* experience.  Maybe others have had 
> different
> experiences, but all my dynamic information was always encapsulated in
> one place: the generator.  The transformers had an infinite ergodic
> period (infinite in that they did not change until there was a 
> development
> reason to do so).  The serializers were quite simple.

The CIncludeTransformer has an undefinable ergodic period, since it 
depends on the aggregated resources and these are known only at runtime.

It is true that, for caching reasons, we tried to enforce people to 
move everything that influenced caching at generation stage (and avoid 
making serializers have access to the environment), but this is not 
always possible, as the CIncludeTransformer suggests.

Sure, it is possible to have the cinclude behavior into the generator, 
but the whole benefits of pipelining goes down the drain.

we must be more general than what you suggest or the cache will only 
adapt to your specific cases.

> With that combination facts, the only thing in my pipeline with an
> ergodic period less than infinity was the generator.  For my static
> pages, I could have simply compiled the results and served them.  For
> my dynamic pages, the generation was pretty darn quick.
>
> Then again, maybe my *experience* is not typical, which means what is
> good enough for me is not good enough for others.

Exactly, but being the caching more general and adapting to the optimal 
solution by itself, what is good enough for a general concept, should 
be good enough for the specific one.

> The basic crux of programming Intelligent Agents is that agents react
> to the state of the environment, discover the best response, and act
> (usually in a way that affects the environment).
>
> What you outlined was the "search algorithm" (the discovery method),
> and the maner in which the agent (cache controller) acted upon the
> environment.
>
> What you left as an excersize to the reader was the Cost function.

See my email to Marc for more insights.

> The
> rules-based approach is one way of achieving that cost function 
> declaratively.

True. But I prefer a purely statistical approach because I'm lazy and I 
don't want write those rules: I want to the system to find them out for 
me. ;-)

> Any time you incorporate conditional logic in your cost function, you
> have explicit rules.

True. But I showed that you don't need to. Costs are always calculated 
the same way (which is by the economical definition of a defined cost, 
BTW)

> Any time you have a weighted cost that factors in
> different elements you have implicit rules.

Again true and I concur that finding out those weights can be tricky. 
But, IMO, choosing three numbers (in case of the time/memory/disk 
function) is much easier than finding out each and every rule.

> Explicit rules are easier to
> debug and understand, also to predict.

Here I disagree.

An example from spam filtering might give you an insight of what I 
mean. My bogofilter found out that the email with #FF0000 where likely 
to be spam. This is because #FF0000 is the HTML code for the red color 
and spam use red to catch attention (since then, spammer understood 
that in order to get thru, spam needs to look like regular email so 
they avoided that and this rule is much less effective).

My point is: I don't know about you, but I would have never came up 
with a rule such as "if it contains #FF0000, then increase the 
spamicity" (spamicity for bogofilter is the measure of the likeliness 
of an email being spam, it is a real number between 0.0 and 1.0, where 
1.0 is definately spam, 0.0 is definately ham)

Moving this in the realm of caching, as I previously state, it might be 
the case that the cache is efficient between 11:30AM to 2:00PM but 
never in the rest of the day. How are you going to find out that rule 
by yourself? Besides, suppose that your web site attracts more people 
from other timezones: you have to change the rules accordingly.

I can make tons of examples on why an adaptive system is much better 
than a rule engine and, in general, can perform equally better given a 
reasonable amount of statistical information.

> There are different ways of applying
> explicit rules without resorting to if/then/else or switch/case 
> functions.
> In fact you might be able to come up with a way to translate explicit 
> rules
> into a function with implicit rules.

This is exactly what I did.

> Truth be told, your search algorithm might be as efficient as they 
> come.
> It might not.

True. It has to be demostrated. But the incredible ability of 
bogofilter gives me hope that it is the case even for caching.

> But do keep in mind the poor administrator/installer.  The guy who is
> managing the install is more interested in the efficiency of the cache
> and how Cocoon uses resources than the developer.  The user in many 
> cases
> won't know the difference because their network connection is 
> throttling
> the response, or their browser is the bottleneck.  The efficiency of 
> the
> cache will likely have the most impact on the scalability of the 
> server.
> That is my OPINION, and should not be taken as gospel.  Please do not
> shoot me for holding that opinion.

Not at all. When implemented, the dynamic adaptability will be 
configurable and you can turn it off or plug in your own cache if you 
dislike it.

Still, keeping in mind the poor administrator/installer, I don't see 
how a rule engine can be easier to configure than a system that doesn't 
need anything to learn and adapting and optimizing resources as it 
works. by itself and with no need for supervision.

> I took the 18 pages home and started to play with some code.  I 
> appreciate
> the journey to get to the search algorithm you proposed.  The functions
> as declared in the beginning were not that efficient, and on my machine
> as soon as you had more than ~410 samples for the invalid/valid cost
> evaluations the function took ~10ms (the granularity of my clock) to
> evaluate.  The evaluation used constants for all the cost figures 
> because
> that would be the computationally cheapest way to evaluate the 
> efficiency
> of those functions.
>
> After this exercise, to my dismay, was the better way of evaluating a
> continual cost value.  That way instead of having a sample for each
> request at a particular time, we only had to work with three samples.
> A vastly improved search algorithm in terms of being computationally
> cheap.

Hmmm, I'm very intersted on this, can you elaborate more?

> Before it was unclear what type would best represent the cost function,
> but when you introduced the trigonometric functions into the mix, it 
> was
> clear that floating point precision was required.

I think it's the case and today's CPU don't make it slower to use 
floating-point math. Still, the hyperbolic tangent function was just 
one solution. there is at least another family of simply polynomial 
functions that exhibit the same overall properties.
>
> Perhaps seeing what you want in code would be the best thing.  It would
> help solidify things for us that don't do the math or are overwhelmed 
> by
> a very long whitepaper and trying to derive from it how it will 
> practically
> make our lives better.  It is would help determine the cost/benefit 
> ratio
> of actually developing this thing.  As it stands, any idea that takes 
> 18
> pages to explain gives the impression of a very high cost.  Whether 
> this
> bears out in practice or not remains to be seen.

I can't tell you when, but I'll do this.

At the end, it might all be an accademic exercise that doesn't work 
IRL, I don't know. But in case it does, it would be a very different 
system that would provide cocoon with even more advantages over other 
solutions (or, at least, that's my goal).

thanks for taking the time to investigate more into this. very 
appreciated.

--
Stefano.


Mime
View raw message