httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexei Kosut <ako...@leland.Stanford.EDU>
Subject Re: Core server caching
Date Wed, 23 Sep 1998 21:37:31 GMT
On 23 Sep 1998, Ben Hyde wrote:

> The core problem of caching seems to me to get confused by the
> complexity of designing a caching proxy.  If one ignores that then the
> core problem of caching seems quite simple.

Actually, for an HTTP server, they're the same problem, if you want to be
able to cache any sort of dynamic request. And caching static requests is
kind of silly (Dean's flow stuff notwithstanding, making copies of static
files in either memory or on disk is silly, since the OS can do it better
than we can).

> Let's say the core server has a cache consisting of a set of fragments
> that it can return as all or part of a response quickly.  The only
> operations on these entries might be create, delete, and send.  It
> might just barely be useful to have a rank attribute on these entries
> so the core had a hint which entries were believed hotter than others.


> Later they can stream out quickly by doing resend_cached_chunk(id).
> It isn't the core server's job to implement some wonderful cache
> management algorithm that fits all sizes of users.  That's a
> problem that ought to be left to the non-core parts of the site.
> If they want to invalidate the cache at press time, request time,
> restart, well that's up to them.

Invalidation isn't the problem. That's easy to punt to the provider of the
original data; but there are still a lot of 

> It is the core server's job to implement the mapping from URL to
> response generator in a sufficiently quick and flexible way.  I don't
> see that the selection of a generator which uses the cache is a
> special case of that part of the design problem.

Nope; adding a cache greatly complicates things. Consider: If I have a
script-based object, that returns a different response based on some
parameter of the request that is not the URI, if I do not have a cache,
the core has an easy job: Figure out the URI maps to the script, run the
script, send the output to the client. If I have a cache, and I want to
use it, the core server need to do more: When the script outputs a
response, the core needs to find out what request dimensions the object
(as a whole) depends on, and which parameters - and acceptable
variation of the parameters - of those dimensions the specific current
response uses. Then it needs to store all that information. When another
request for the same URI comes in, it needs to be able to compare the
current request to all of the cached versions, and figure out if the
dimensions match, and which one matches best, and if it's necessary to
call the script again.

That's the same thing an HTTP caching proxy needs to do. And if we don't
do all of that, then putting a cache into the core is rather pointless,
since it doesn't do anything other than let us either:

1. Only cache static content - i.e., content that only depends on the URL
and not on any other request dimensions.

2. Only return the cached response when the request is *exactly* the same
for the stated dimensions. Which, if you're mapping on many useful things
(User-Agent, IP address, time of day, etc...), is compeltely useless, as
two requests will not be the same enough times to matter.

The only other option is to add a cache to each individual module, but
that seems really like a bad idea.

> Note if the API the core has to caching is very small then it becomes
> a lot easier to implement variations of it behind the scenes.

-- Alexei Kosut <> <>
   Stanford University, Class of 2001 * Apache <> *

View raw message