cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Berin Loritsch <>
Subject Re: [RT] Adaptive Caching
Date Fri, 18 Jul 2003 12:58:52 GMT
Geoff Howard wrote:

> Well, since Peter's dragged me into this... ;)
> Hunsberger, Peter wrote:
>> Stefano Mazzocchi <> writes (and writes, and writes,
>> and writes):
>> <small snip/>
>>> WARNING: this RT is long! and very dense, so I suggest you to turn on 
>>> your printer. 
> Stefano, I started writing a response back about 5 minutes after getting 
> your original RT but started getting the idea I hadn't fully understood 
> the RT and haven't had time to go back in more detail.  I'm very 
> interested in this and have been following the discussion, but have been 
> waiting to see if I really "get" it before speaking.  The following 
> would help me (and maybe others?) understand:
> Which of the following does your RT address:
> - Deciding when the overhead of caching is worthwhile on a given item.
> (and which part of the overhead - the act of storing, or the resource use)
> - Deciding when to purge the cache (aka, a better StoreJanitor/MRU)
> In the first scenario I'd have trouble seeing how this calculation could 
> be any less costly than the current.  But only testing would tell for 
> sure, and I'll be very interested to see it develop.

It isn't about being less costly to come to the decision, it is about making
better decisions.  Consider it a more advanced branch detector.

> The second scenario has little to argue against it.  I missed however 
> whether taking the frequency of matching requests is possible.  In other 
> words, if I have 100 reports whose cost weighs high but are only 
> requested several times a month and are reasonable to have to wait for, 
> and other items with a smaller cost but are requested thousands of times 
> daily can I come up with a cost function that favors the latter?

The search algorithm (i.e. the decision making process) will naturally
favor the latter.  The cost function will be the feedback loop that helps
it.  We could apply the same decision making process to the purging process.
The cost function would be written based on cost saved.  If I remove this
resource, how much cost would I save?  If the saved costs exceed the production
costs, then the item should be purged.

> Another interesting thing about this kind of setup is that if you commit 
> to it, you could get out of all validity calculations all together.  If 
> it's still in the cache, serve it.  I will be experimenting with this to 
> see if that gets any benefit in practice.

You might want to see the code I threw together and posted to the list
under the title "[mock] Adaptive Caching".  The purpose of the set of
classes is to determine the efficiency of the system overall.  I need
to refactor the tests to remove unnecessary randomness, but you will
see roughly how things work.  Then again, I may have it wrong--which is
why I hope that Stefano can give some feedback on it.

> This would be better IMHO if it was left to the cache's discretion to 
> cache the pushed update or not.  If it was currently cached, it would 
> make sense but otherwise not.  For instance, if I update an entire table 
> with rows which never get requested, you wouldn't want them pushed into 
> the cache especially at the expense of more valuable entries.

For the push aspect, I agree.  The only way to determine if pushing things
to the cache will help the overall system is if we had a measure of relative
cost impact on the system.  IOW, if the resource is expensive to create,
but it is likely to be requested often, we would have to evaluate the overall
cost of pushing the resource to the cache vs. the cost if the resource was
not pushed.  In that case, a report that expires in a few minutes that might
be requested once a month wouldn't be generated and pushed to the cache.
The cost of storage and purging and the chance of the resource being purged
before being requested would preclude that.  However if there is an expensive
resource to generate that is requested often, and has a long ergodic period,
then it makes sense to push that to the cache before it is requested.

That is the beauty of basing all decisions on cost.  The cache may make a
few mistakes early on because there is no cost information associated with
a resource, but as time roles on it will correct its mistakes and start
making decisions that you would not normally think is helpful, but when
you study the impirical data, you find that it is.

In essense we have an expert caching system, but it is based on mathematical
representation of cost rather than explicit rules.  As a result it is
theorhetically quicker.


"They that give up essential liberty to obtain a little temporary safety
  deserve neither liberty nor safety."
                 - Benjamin Franklin

View raw message