cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Gaspar" <>
Subject RE: Adaptive Caching [was Re: initial checkin of the Scheme code]
Date Sun, 16 Dec 2001 04:05:18 GMT
Answer inline:

> -----Original Message-----
> From: Antti Koivunen []
> Sent: Saturday, December 15, 2001 5:24 PM
> ...
>  > However, as you also mention, there is the cost of sampling. If you
>  > have a processing time expensive document "A" with a maximum cache
>  > lifetime of 24 hours that is usually requested 100 times a day...
>  > and then you sample how much time it takes to get it 100 times a
>  > day, the accuracy gets better but the cost of the sampling is as
>  > big as the cost of not caching at all.
> I'm not sure that I understand this. I think Stefano's idea was to use 
> normal requests for sampling. If the document is served a lot faster 
> from the cache, very few requests would cause it to be regenerated 
> (unless it's modified). The actual sampling calculations are very cheap 
> compared to I/O or XSLT operations.

Maybe you understood it. It is just as stupid as it sounds.

I was just saying that we (obviously) should not get a requested and 
already cached resource again from its origin (as if there was no cache) 
several times along a sampling period just to have more sampling data. 

Then we would have more accurate data (and hence better cache tuning) but
we would not be using the cache all those times.

>  >
>  > But usually you have families of documents that take a similar time
>  > to process, like:
>  >  - Articles with 4000 words or less without pictures stored in XML
>  >    files;
>  >  - Product records from a product catalog stored in a database;
>  >  - Invoices with less than 10 items from a database.
>  >
>  > If your time measurements are made per family, you will usually
>  > end up with a much wider set of sample data and hence much more
>  > representative results. The system use will generate the repeated
>  > samples and their distribution along the time (and along load peaks
>  > and low load periods) will tend to be much more representative than
>  > any other mechanism we could come up with.
> I understand the issue, but I'd like to see the process completely 
> automated. I think we might get enough information just by sampling the 
> time it takes to produce a resource (and perhaps the frequency of the 
> requests).

IMHO, completely automated means loosing a lot of performance. I think 
you will not get much more than what you already get from a non adaptive

> BTW, the Colt project ("Open Source Libraries for High Performance
> Scientific and Technical Computing in Java") provides several nice
> implementations of various collection types (and much more). Also the
> javadocs are some of the best I've ever seen :) And here's the link:
> (: Anrie ;)

I already knew that one but had lost track of it. Some stuff looks 

Have fun,
Paulo Gaspar

To unsubscribe, e-mail:
For additional commands, email:

View raw message