Mailing-List: contact cocoon-dev-help@xml.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cocoon-dev@xml.apache.org
Reply-To: <paulo.gaspar@krankikom.de>
From: "Paulo Gaspar" <paulo.gaspar@krankikom.de>
To: <cocoon-dev@xml.apache.org>
Subject: RE: Adaptive Caching [was Re: initial checkin of the Scheme code]
Date: Sun, 16 Dec 2001 05:05:18 +0100
Message-ID: <NEBBJJIICDPIINBNIJHJMEBDMJAA.paulo.gaspar@krankikom.de>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
Importance: Normal
In-Reply-To: <3C1B7932.2010702@users.sf.net>

Answer inline:

> -----Original Message-----
> From: Antti Koivunen [mailto:anryoshi@users.sourceforge.net]
> Sent: Saturday, December 15, 2001 5:24 PM
>
> ...
>
>  > However, as you also mention, there is the cost of sampling. If you
>  > have a processing time expensive document "A" with a maximum cache
>  > lifetime of 24 hours that is usually requested 100 times a day...
>  > and then you sample how much time it takes to get it 100 times a
>  > day, the accuracy gets better but the cost of the sampling is as
>  > big as the cost of not caching at all.
> 
> I'm not sure that I understand this. I think Stefano's idea was to use 
> normal requests for sampling. If the document is served a lot faster 
> from the cache, very few requests would cause it to be regenerated 
> (unless it's modified). The actual sampling calculations are very cheap 
> compared to I/O or XSLT operations.

Maybe you understood it. It is just as stupid as it sounds.
=;o)

I was just saying that we (obviously) should not get a requested and 
already cached resource again from its origin (as if there was no cache) 
several times along a sampling period just to have more sampling data. 

Then we would have more accurate data (and hence better cache tuning) but
we would not be using the cache all those times.


>  >
>  > But usually you have families of documents that take a similar time
>  > to process, like:
>  >  - Articles with 4000 words or less without pictures stored in XML
>  >    files;
>  >  - Product records from a product catalog stored in a database;
>  >  - Invoices with less than 10 items from a database.
>  >
>  > If your time measurements are made per family, you will usually
>  > end up with a much wider set of sample data and hence much more
>  > representative results. The system use will generate the repeated
>  > samples and their distribution along the time (and along load peaks
>  > and low load periods) will tend to be much more representative than
>  > any other mechanism we could come up with.
> 
> I understand the issue, but I'd like to see the process completely 
> automated. I think we might get enough information just by sampling the 
> time it takes to produce a resource (and perhaps the frequency of the 
> requests).

IMHO, completely automated means loosing a lot of performance. I think 
you will not get much more than what you already get from a non adaptive
cache.

>...
>
> BTW, the Colt project ("Open Source Libraries for High Performance
> Scientific and Technical Computing in Java") provides several nice
> implementations of various collection types (and much more). Also the
> javadocs are some of the best I've ever seen :) And here's the link:
> 
> http://tilde-hoschek.home.cern.ch/~hoschek/colt/
> 
> (: Anrie ;)

I already knew that one but had lost track of it. Some stuff looks 
interesting!
=:o)


Have fun,
Paulo Gaspar

http://www.krankikom.de
http://www.ruhronline.de
 

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org