httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Tatarinov <tatar...@prairie.NoDak.edu>
Subject Re: Want to add file caching to Apache
Date Thu, 20 Nov 1997 17:31:20 GMT
Marc Slemko wrote:
> 
> On Thu, 20 Nov 1997, Ben Laurie wrote:
> 
> > Igor Tatarinov wrote:
> > > It is relatively easy to get a high hit ratio (>80%) in a Web server
> > > cache (see for example, http://www.cs.ndsu.nodak.edu/~tatarino/pubs/static.ps
> > > you don't need to read it entirely, just look at the graphs)
> >
> > This does not agree with a study Digital did (admittedly that was of
> > proxy caches). If I remember correctly (and it's entirely possible I
> > don't), they got < 40% hits.
> 
> I think that is a critical difference.
> 
> Consider your 2 meg porn site that gets a million hits a day.  A cache
> would certainly help that.
> 
> Consider your site that is a frontend to a multigigabyte database, where
> queries are spread reasonably evenly across the whole thing.  A cache
> would help that considerably less.
> 
> I would suggest it would be worthwhile modelling the hit rates using a
> simulation based on logfiles.  While you will have to fudge a few numbers
> (unless you log in a nonstandard form), it should be reasonable.

That's what I've been doing for almost a year:
http://www.cs.ndsu.nodak.edu/~tatarino/pubs/static.ps
http://www.cs.ndsu.nodak.edu/~tatarino/pubs/cache-policies.ps
http://www.cs.ndsu.nodak.edu/~tatarino/pubs/perf-analysis.ps

Log-based simulation has certain problems though. One small problem is 
that it is impossible to check if the requested file can be cached or
it's not allowed to be cached. A related question, is there any way to 
check that in the Apache request handler. I know that there is the 
no_cache field in request_rec but is it really used?
I would guess that files with SSI (and some CGI ouputs) shouldn't be 
cached at all but is there a simple way to check for that?

The main problem with simulation is that you never know how difficult 
it will be to implement what you are suggesting. This wouldn't be a
problem on a single-process, async I/O-based server (which is not 
multiple CPU-scaleable) but Apache is not that ilk.
Because of the above, I will first try to implement a simple policy 
like lru+threshold instead of something really smart. As a result, 
files larger that the threshold will never be cached. Ideally, the
threshold should be tuned automatically.

igor

Mime
View raw message