cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Judson Lester <>
Subject Re: Adaptive Caching [was Re: initial checkin of the Scheme code]
Date Wed, 19 Dec 2001 21:23:38 GMT
Please forgive me if I'm being a buttinsky, but...

> > What I do not understand is: having an XML fragment produced by a given
> > generator (that gets an invoice from the database and "translates" it
> > into XML), do you _always_ track costs per each invoice instance comming
> > from that generator (one entry with the average cost of generating XML
> > for invoice #12456 and another one for invoice #12122) or could you keep
> > ONE cost entry aggregating the cost of all invoices comming from that
> > generator (only one entry with the average cost of generating XML for
> > any invoice, #12456's cost added together with #12122's).
> Sorry, I didn't understand what you were saying.

As I understand Paulo, say you have a situation where there's an invoice 
report in the web app.  The user can request a listing of invoice by number, 
and the invoice is looked up and a report is generated.  This process is 
assumed to very similar in a consumption-of-resources way, regardless of 
which invoice the user selects.  However, the data produced for a request for 
invoice X is completely different than invoice Y, so, obviously, returning 
the cached version of X if Y is request is utterly wrong.  

So, each invoice request should have it's own cache request, but all cache 
requests might share a single sampling key.  The advantage, supposedly, is 
not storage of sampling data (although this would also be so), but to enhance 
the quality of the sampling data for the entire class of invoice requests.

	a> Every invoice uses it's own sampling data.  The vast majority of the tens 
of thousands of invoices are requested incredibly infrequently, while some 
(the most recent) will be requested a several times, and then their frequency 
of request will drop off precipitously.  On some level, the same caching 
lesson is learned over and over again by the caching system.

	b>All invoice reports share a single sampling key.  The vast majority are 
still only requested very infrequently, with the most recent being requested 
more frequently.  Now, though, the system can adapt to the best caching for 
invoice reports as a whole (which will probably be to almost always 
regenerate), which leaves the most recent to be regenerated more frequently 
they might need to be.  

Paulo, have I represented your ideas correctly?  Stefano, does this make more 

<aside type="RT">
This is the (arguably correct) behavioral inverse of the focus of this 
adaptive caching policy.  It's been said that if the cost of using the cache 
is lower, it is more likely to be used.  However, it's also correct that a 
more costly caching operation will be used less often.  

Of course, this presents the additional complexity (although with effort it 
might become 'sophistication" :-O ) of group membership.  For instance, it's 
intuitively obvious that there is an age-based partition that could be made 
on the invoice generator group, and that "new" invoices have a different key 
that "old" invoices, and that new invoices would partake of the new-invoice 
key and old invoices of the old-invoice key.  Finally, what if those 
partitions were fuzzy, and any invoice could be .8 new and .2 old?  I don't 
think that complicates the math unduly.  (Can you tell I studied with a fuzzy 
logic prof?)

The natural implementation of this would be for each node of a pipeline to 
have a key, but the generator be able to provide a method to specify 
partitions for the reponse to a particular request and their unit-sum weight. 
Thus, sample keys are the pipe-path(?) plus an optional partition, and a 
specific request might partake and contribute to sampling multiple sampling 

This would be an extension to the adaptive caching with sampling groups, and 
would be backwards compatible.  I wonder about it's utility...  But I think 
Paulo's sampling groups idea has merit.


To unsubscribe, e-mail:
For additional commands, email:

View raw message