cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Gaspar" <>
Subject RE: Adaptive Caching [was Re: initial checkin of the Scheme code]
Date Thu, 20 Dec 2001 15:05:20 GMT
Hi Judson,

> -----Original Message-----
> From: Judson Lester []
> Sent: Wednesday, December 19, 2001 10:24 PM
> Please forgive me if I'm being a buttinsky, but...

No, you made what I said much clearer. Maybe I wouldn't have written that
long post yesterday if I had seen yours before.

> > > What I do not understand is: having an XML fragment produced
> by a given
> > > generator (that gets an invoice from the database and "translates" it
> > > into XML), do you _always_ track costs per each invoice
> instance comming
> > > from that generator (one entry with the average cost of generating XML
> > > for invoice #12456 and another one for invoice #12122) or
> could you keep
> > > ONE cost entry aggregating the cost of all invoices comming from that
> > > generator (only one entry with the average cost of generating XML for
> > > any invoice, #12456's cost added together with #12122's).
> >
> > Sorry, I didn't understand what you were saying.
> As I understand Paulo, say you have a situation where there's an invoice
> report in the web app.  The user can request a listing of invoice
> by number,
> and the invoice is looked up and a report is generated.  This process is
> assumed to very similar in a consumption-of-resources way, regardless of
> which invoice the user selects.  However, the data produced for a request
> for invoice X is completely different than invoice Y, so, obviously,
> returning the cached version of X if Y is request is utterly wrong.


> So, each invoice request should have it's own cache request, but
> all cache
> requests might share a single sampling key.  The advantage,
> supposedly, is
> not storage of sampling data (although this would also be so),
> but to enhance
> the quality of the sampling data for the entire class of invoice requests.

Exactly. That is the point I wanted to make.

> Compare:
> 	a> Every invoice uses it's own sampling data.  The vast
> majority of the tens
> of thousands of invoices are requested incredibly infrequently,
> while some
> (the most recent) will be requested a several times, and then
> their frequency
> of request will drop off precipitously.  On some level, the same caching
> lesson is learned over and over again by the caching system.
> to
> 	b>All invoice reports share a single sampling key.  The
> vast majority are
> still only requested very infrequently, with the most recent
> being requested
> more frequently.  Now, though, the system can adapt to the best
> caching for
> invoice reports as a whole (which will probably be to almost always
> regenerate), which leaves the most recent to be regenerated more
> frequently
> they might need to be.
> Paulo, have I represented your ideas correctly?

Very correctly indeed!

> Stefano, does this make more sense?
> <aside type="RT">
> This is the (arguably correct) behavioral inverse of the focus of this
> adaptive caching policy.  It's been said that if the cost of
> using the cache
> is lower, it is more likely to be used.  However, it's also
> correct that a
> more costly caching operation will be used less often.
> Of course, this presents the additional complexity (although with
> effort it
> might become 'sophistication" :-O ) of group membership.

Of course, but the additional complexity is very low and can be
completely optional.

Only those that want its (potentially interesting) profit will
have to pay its (low) cost.

> For instance, it's
> intuitively obvious that there is an age-based partition that
> could be made
> on the invoice generator group, and that "new" invoices have a
> different key
> that "old" invoices, and that new invoices would partake of the
> new-invoice
> key and old invoices of the old-invoice key.  Finally, what if those
> partitions were fuzzy, and any invoice could be .8 new and .2
> old?  I don't
> think that complicates the math unduly.  (Can you tell I studied
> with a fuzzy
> logic prof?)

Interesting ideas.

> The natural implementation of this would be for each node of a
> pipeline to
> have a key, but the generator be able to provide a method to specify
> partitions for the reponse to a particular request and their
> unit-sum weight.
> Thus, sample keys are the pipe-path(?) plus an optional partition, and a
> specific request might partake and contribute to sampling
> multiple sampling
> keys.
> This would be an extension to the adaptive caching with sampling
> groups, and
> would be backwards compatible.  I wonder about it's utility...

Yes, in these kind of things we can only quantify the potential advantage
of one solution over the other by testing both.

I just think it might work well and that it is not much extra coding work.

Probably it will take less extra coding time than the time we took writing
about it.

> But I think Paulo's sampling groups idea has merit.


> </aside>
> Judson

Have fun,

To unsubscribe, e-mail:
For additional commands, email:

View raw message