httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Graham Leggett" <>
Subject Re: mod_disk_cache summarization
Date Tue, 24 Oct 2006 12:47:09 GMT
On Tue, October 24, 2006 2:22 pm, Joe Orton wrote:

>> In essence, the patches solve the thundering herd problem.
> I still think it's fundamentally wrong to try to "fix" that problem in
> this way.  It seems like the cache is being re-implemented to optimize
> for some very specific deployment scenarios, which scares me quite a
> lot.

The thundering herd problem has been well documented since it first
appeared as a bug against v1.3's mod_proxy in around 1998 or 1999. As soon
as you try and use httpd in any environment with high load, the thundering
herd problem bites people.

People who run high load sites shouldn't have to write their own cache
module or patch httpd before mod_cache works, and that's exactly what's
happened here. Twice.

The problem is also present in other caches as well, so you get really
lame behaviour like X Windows PCs all fetching the same system update at
the same time through the same transparent proxy, but the download is done
X times over because all the downloads were started before the first
download to complete cached the file.

Workarounds like fiddling with expiry times help, but they aren't a
permanent solution, and they certainly aren't a solution to the scenario

> IMO: for a general purpose cache it is not appropriate to stop and try
> to write the entire response to the cache before serving anything.

Correct, that is the next problem to solve.

> Neither is it appropriate to have any process do the "sleep and stat"
> loop waiting for some other process to finish writing a cache file.

Correct, thus a notify API was suggested, which needs to be added to APR.

> And
> certainly having the cache fork threads/processes so it can internally
> cache and serve simultaneously is the most scary idea of all.

Correct, which is why it wasn't committed. I am looking for a way for the
network filter to do non blocking writes. The notify API will be useful
here as well.

> The cache can be simple and correct by using the open/O_EXCL logic to
> avoid caching the same URL simultaneously in multiple processes.  In the
> case where the open gives EEXISTS, the cache filters should just get out
> of the way and let the resource be served normally.
> I think the only feasible approach to mitigating the "thundering herd"
> expiry problem is to use the fuzzy expiry logic Brian described in his
> talk: offseting the expiry time by some random offset in each process.

If people are rewriting or patching mod_disk_cache, then mod_disk_cache is
not correct. Thundering herd is a difficult problem to solve, which is why
it simply hasn't been solved in most caches, including ours. But
implementing sub-optimal workarounds is not a permanent solution.

I would rather be more complex and correct, than simple and broken.


View raw message