httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graham Leggett <minf...@sharp.fm>
Subject Re: mod_cache and its ilk
Date Mon, 30 Oct 2006 18:38:39 GMT
Justin Erenkrantz wrote:

> +1.  I would prefer that all bells and whistles be kept out of
> mod_disk_cache and we provide another alternative that has more
> complex behaviors that scales a little better for those who care.
> (That was the whole reason behind a versioned provider interface to
> begin with!)

In that case I suggest we revert the mod_disk_cache back to how it was 
before the large-file patches (but after the sets of bugfixes), and to 
move the current modified mod_disk_cache to mod_large_disk cache for 
further review?

>> 1) cannot write entire response to disk for any content type before
>> sending anything to the client; filter acts by writing to cache and
>> client synchronously
> 
> My concern with this is we should be careful not to teach the
> providers about the fact that it is sitting in an output filter chain.
> 
> In other words, I would prefer that we create an abstraction that does
> not force the providers to have to worry about passing brigades down a
> filter chain or dealing with filter errors.  It just needs to stash
> some data away and that's all I want the provider to be concerned
> with.
> 
> Perhaps we provide an optional continuation function to the
> store_body() analogue that the provider can call after it writes a
> bucket which mod_cache can then pass along down the filter chain on
> its behalf.  Otherwise, we'd have way too much duplicate code to deal
> with and that concerns me.
> 
> The implementation that was recently checked in makes it so that we
> can not distinguish between a failed cache storage and a filter error.
> I'd like to be able to maintain knowing the difference and not having
> a failed cache storage affect the client.  See my add'l item #5 below.

My original solution for this was, inside mod_cache, to loop through the 
brigade, and to split buckets larger than a threshold before giving the 
buckets to cache_body() and the network in turn.

cache_body() would then be guaranteed that buckets would never exceed a 
certain size.

The certain size was configurable via a directive.

At the time I did not know that the apr_bucket_read() function did 
exactly this anyway - this behaviour is undocumented in doxygen - and 
apparently 4MB is the maximum size that apr_bucket_read() will return.

Based on this, we could remove the directive and default it to 4MB (or 
the same constant used in apr_bucket_read()), and use apr_bucket_split.

This has the advantage of keeping the API clean as before, and it 
ensures that cache providers don't have to care about large buckets.

The cache also doesn't have to care how 4.7GB of buckets are going to be 
stored temporarily between write-to-cache and write-to-disk, as the 
buckets so split are deleted after each iteration.

>> 2) avoids writing a given URL/variant to the cache more than once
>> simultaneously using open/O_EXCL
> 
> There's problems with relying upon O_EXCL.  mod_disk_cache
> purposefully lets the race condition happen as without inter-process
> sync, it's not really easy to know who is actually around and is
> likely to finish.  So, mod_disk_cache lets the last guy to succeed
> 'win'.  I think this leads to a far simpler design within
> mod_disk_cache (it trades writing duplicates of the cached entity for
> a higher chance of success), but I'd be curious to see if you can come
> up with a scalable design using O_EXCL.  I tried before and never came
> up with a clean or scalable design as the solutions with O_EXCL placed
> too much faith in the fact that one selected client will actually
> finish.  Although, at the end of the day, I'm willing to sacrifice
> that (and let another provider solve it), but I'd like to at least
> explore solutions that don't.

I'm not convinced the original last-guy-wins design was a good idea. On 
sites with high load, it meant many simultaneously cached copies would 
race to complete, using number_of_attempts * size diskspace being used 
unnecessarily, and a potentially large load spike to the backend.

People with high load sites cite this as a key reason not to use the 
cache, as the resultant load spike to their backends cannot be sustained.

The large_disk_cache has made an attempt to address this issue, where 
the above is a problem, this is a case where large_disk_cache can step in.

>> 3) for the case where multiple simultaneous requests are made for an
>> uncached but cacheable entity, will simply pass-through all requests for
>> which the open/O_EXCL fails and does not attempt to read from an
>> as-yet-incomplete cache file
> 
> +1 modulo if you can get a clean design per #2 above.
> 
> I just noticed that someone introduced an "open_header_timeout" which
> incurs a penalty when a partial file is present (!).  So, let me add
> the following constraint derived from #3:
> 
> 4) No penalties are applied to the client when we do not have a
> complete cached entity on hand.
> 
> Sleeping is never an acceptable solution.  If we don't have it, it's
> okay to serve directly and avoid the cache.  mod_disk_cache takes the
> opportunity to try to recache, but avoiding the cache is okay too;
> sleeping is an absolute no-no.  Both are acceptable solutions, IMO.
> sleeping is not.

This was pointed out last week by Joe, and it was suggested that a 
notifer API be introduced to APR allowing far more intelligent handling 
of this condition.

> And, derived from the issue with the failure scenario:
> 
> 5) If writing to the cache fails, the client doesn't need to know
> that, but the admin should get a nice note somewhere.
> 
> On the surface, this speaks against the storage operation being a
> filter in the traditional sense.

If the write-to-cache fails, a message is logged, the half cached file 
is deleted, the cache_save filter steps out of the way, and the rest of 
the brigade continues up the stack - being a filter gives us access to 
this graceful fallback mechanism.

>> A disk cache which makes different assumptions about speed of backend or
>> uses more complex caching tricks should be in the tree as a different
>> provider in a module with a different name; the exact assumptions made
>> need to be well documented.  (Maybe something like
>> "mod_local_disk_cache" would be appropriate for the name?)
>>
>> Needless to say, code making bogus assumptions about the output
>> filtering interface has no place in the tree and fails the "correctness"
>> test.
> 
> +1.  -- justin

Instead of making assumptions, querying actual status of the output 
filtering interface however does have significant value, and becomes 
possible again with a notifier API.

All the cache needs to know is "will it block if I pass this output 
brigade up the stack?". If the answer is yes, the brigade can be set 
aside for transmission on the next iteration. This copies the same 
technique used in ap_core_output_filter() now.

The ability to cache the response at a rate higher than and independent 
of the downstream network write (and without using dodgy tricks like 
thread or fork) is the cornerstone of mod_large_disk_cache.

Regards,
Graham
--

Mime
View raw message