cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Berin Loritsch <blorit...@apache.org>
Subject Re: [RT] Adaptive Caching
Date Fri, 18 Jul 2003 18:27:23 GMT
Hunsberger, Peter wrote:

> Stefano Mazzocchi <stefano@apache.org> responds:
> 
> <snip/>
> 

>>I think I really lost you here. What does it mean to "retain the 
>>intermediate results of the parser"? what are you referring to? and 
>>what kind of database do you envision in a push pipe? Sorry, I don't 
>>get it but I smell something interesting so please elaborate more.
> 
> 
> Let me try this a different way:  one of the design decisions driving
> the use of SAX over DOM is that SAX is more memory efficient.  However,
> if you're caching SAX event streams this is no longer true (assuming the
> SAX data structures and DOM data structures are more less equivalent in
> size).  Thus, caching calls into question the whole way in which parsing
> and transformation work: if you're going to cache, why not cache
> something which is directly useful to the parsing and transformation
> stage instead of the output?  It's a bit of a radical thought because in
> a way you're no longer assembling pipelines.  Rather, you're pushing
> data into a database and pulling data out of the database.  The database
> just happens to work as a cache at the same time as it works for storing
> parser output.  Since it's a database the contents can be normalized
> (saving space) since it feeds transformers directly it saves parsing
> overhead (saving CPU).  (Recall the discussion we had on push vs. pull
> parsing and lazy evaluation.)

What you described here is a function of storage and retrieval.  No matter
how that is changed or optimized, the process of determining whether to
cache or not is up to the algorithm that Stefano described.  Your proposal
would have an affect on the output of the cost--hopefully it would be smaller.
That said, using a standard cost function can help you determine authoritatively
if your storage mechanism is truly smaller or not.  We have a value that
takes into consideration different parameters.

You might find that your storage solution will cost less with some cost
functions and more with others.  In that case, we would be able to tune
the storage mechanism with the cost function that suits the system best.

>>
>>this said, I admit there are tons of other ways to achieve similar 
>>functionality. I just expressed one.
> 
>  
> Yes, I guess I'll comment here instead of on that part of the thread.
> Two thoughts: 
> 
> 1) Since you aren't tracking a history of events there is no
> relationship to Fourier transforms and sampling periods, they're not
> relevant.  Only if you mapped a specific load to a particular cost would
> a period apply (and then it would be in relationship to loading and not
> time!).  Creating a map of load to cost for each fragment producer would
> be possible, but how do you measure "load" in a meaningful way that can
> be extrapolated to gingival producer behavior over global load?  I don't
> think you can without consuming a lot of resources...

Any time we track a period of history, that does affect things.  Given
global load algorithms and the 10ms granularity of many JVM clocks, that
might be a function best suited for JNI integration.  The JNI interface
will provide hooks to obtain more precise memory info, more precise timing
info (10ms means that until I have >= 10ms of timing all requests are
measured as 0ms--clearly not adequate), as well as a hook to obtain system
load.  This is a function available in UNIX environments, and it would need
to be translated for Windows environments, but it is something that SYS
admins care greatly about.

> 2) I do think you may want some randomness.  However, at this point I
> don't think it applies.   Consider a simple system with say 5 resources.

<snip/>

> At this point we have found the cost of producing the items under load.
> If the system load now starts to decrease we have to decide whether the
> cost under load is still useful or we need to re-evaluate the cost under
> reduced load.  The only way we can re-evaluate it is if we arbitrarily
> force it out of cache.  At first it would seem that since the system is
> under reduced load there are resource available to process the item that
> has the lowest cost at load even if it is not the lowest cost with
> reduced load.  However, this does not guarantee the best global response
> times under reduced load. One can argue (I believe prove) that if you
> pick your cost functions properly that you will arrive at best global
> response time.  This means the question is how to pick the cost
> function.

By separating the cost function out, we can experiment with that until
we arive with something that works.

> So, we see everything comes back to picking
> the right cost algorithm.  This is the NP hard packing problem (in
> another guise).  As a result, this is why it makes some sense to allow
> for the possibility of introducing randomness at this point.  We can use
> Monte Carlo like simulation to make up for picking less than optimal
> cost functions.

Yep.

> Two important results:
> 
> 1) It only makes sense to fall back to introducing randomness under
> conditions of less than full load or if we are thrashing.  Both of these
> can be hard to determine, thrashing cycles can be long and involved.
> Under full load or near full load re-evaluation will be forced in any
> case (if resources are impacted enough for it to matter).

How do you know what full load is?  100% CPU utilization?  100% memory
utilization? Just because the CPU is running 100% does not mean that
we have a particularly high load.  Check out a UNIX system using the
ps command.  You will find that there is a marked difference between
100% CPU utilization and a load of 1.32 and 70% CPU utilization and a
load of 21.41.

> 2) Using randomness is one way of evaluating cost.  If you have a
> function that produces good cost results then adding the randomness
> doesn't help.  In other words, a Monte Carlo like behavior can be in
> itself a cost evaluation function.

Ok.  All this theory can be proven...

-- 

"They that give up essential liberty to obtain a little temporary safety
  deserve neither liberty nor safety."
                 - Benjamin Franklin


Mime
View raw message