httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Justin Erenkrantz" <>
Subject Re: mod_cache and its ilk
Date Mon, 30 Oct 2006 17:26:18 GMT
On 10/30/06, Joe Orton <> wrote:
> A simple general-purpose disk cache which makes no assumptions about
> speed of backend, speed of storage or speed of clients; is
> single-threaded and does not involve any multi-process synchronisation
> beyond open/O_EXCL.  Specifically:

+1.  I would prefer that all bells and whistles be kept out of
mod_disk_cache and we provide another alternative that has more
complex behaviors that scales a little better for those who care.
(That was the whole reason behind a versioned provider interface to
begin with!)

> 1) cannot write entire response to disk for any content type before
> sending anything to the client; filter acts by writing to cache and
> client synchronously

My concern with this is we should be careful not to teach the
providers about the fact that it is sitting in an output filter chain.

In other words, I would prefer that we create an abstraction that does
not force the providers to have to worry about passing brigades down a
filter chain or dealing with filter errors.  It just needs to stash
some data away and that's all I want the provider to be concerned

Perhaps we provide an optional continuation function to the
store_body() analogue that the provider can call after it writes a
bucket which mod_cache can then pass along down the filter chain on
its behalf.  Otherwise, we'd have way too much duplicate code to deal
with and that concerns me.

The implementation that was recently checked in makes it so that we
can not distinguish between a failed cache storage and a filter error.
 I'd like to be able to maintain knowing the difference and not having
a failed cache storage affect the client.  See my add'l item #5 below.

> 2) avoids writing a given URL/variant to the cache more than once
> simultaneously using open/O_EXCL

There's problems with relying upon O_EXCL.  mod_disk_cache
purposefully lets the race condition happen as without inter-process
sync, it's not really easy to know who is actually around and is
likely to finish.  So, mod_disk_cache lets the last guy to succeed
'win'.  I think this leads to a far simpler design within
mod_disk_cache (it trades writing duplicates of the cached entity for
a higher chance of success), but I'd be curious to see if you can come
up with a scalable design using O_EXCL.  I tried before and never came
up with a clean or scalable design as the solutions with O_EXCL placed
too much faith in the fact that one selected client will actually
finish.  Although, at the end of the day, I'm willing to sacrifice
that (and let another provider solve it), but I'd like to at least
explore solutions that don't.

> 3) for the case where multiple simultaneous requests are made for an
> uncached but cacheable entity, will simply pass-through all requests for
> which the open/O_EXCL fails and does not attempt to read from an
> as-yet-incomplete cache file

+1 modulo if you can get a clean design per #2 above.

I just noticed that someone introduced an "open_header_timeout" which
incurs a penalty when a partial file is present (!).  So, let me add
the following constraint derived from #3:

4) No penalties are applied to the client when we do not have a
complete cached entity on hand.

Sleeping is never an acceptable solution.  If we don't have it, it's
okay to serve directly and avoid the cache.  mod_disk_cache takes the
opportunity to try to recache, but avoiding the cache is okay too;
sleeping is an absolute no-no.  Both are acceptable solutions, IMO.
sleeping is not.

And, derived from the issue with the failure scenario:

5) If writing to the cache fails, the client doesn't need to know
that, but the admin should get a nice note somewhere.

On the surface, this speaks against the storage operation being a
filter in the traditional sense.

> A disk cache which makes different assumptions about speed of backend or
> uses more complex caching tricks should be in the tree as a different
> provider in a module with a different name; the exact assumptions made
> need to be well documented.  (Maybe something like
> "mod_local_disk_cache" would be appropriate for the name?)
> Needless to say, code making bogus assumptions about the output
> filtering interface has no place in the tree and fails the "correctness"
> test.

+1.  -- justin

View raw message