Return-Path: Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: (qmail 50631 invoked from network); 26 Oct 2006 01:11:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Oct 2006 01:11:40 -0000 Received: (qmail 24303 invoked by uid 500); 24 Oct 2006 12:47:43 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 24256 invoked by uid 500); 24 Oct 2006 12:47:43 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 24242 invoked by uid 99); 24 Oct 2006 12:47:43 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Oct 2006 05:47:43 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of minfrin@sharp.fm designates 64.49.220.200 as permitted sender) Received: from [64.49.220.200] (HELO chandler.sharp.fm) (64.49.220.200) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Oct 2006 05:47:30 -0700 Received: from chandler.sharp.fm (localhost [127.0.0.1]) by chandler.sharp.fm (Postfix) with ESMTP id D60BBE5E9B for ; Tue, 24 Oct 2006 07:47:09 -0500 (CDT) Received: from www.sharp.fm (unknown [209.61.173.189]) by chandler.sharp.fm (Postfix) with ESMTP id BB106E52F3 for ; Tue, 24 Oct 2006 07:47:09 -0500 (CDT) Received: from 196.8.104.27 (SquirrelMail authenticated user minfrin@sharp.fm) by www.sharp.fm with HTTP; Tue, 24 Oct 2006 14:47:09 +0200 (SAST) Message-ID: <30700.196.8.104.27.1161694029.squirrel@www.sharp.fm> In-Reply-To: <20061024122227.GB3600@redhat.com> References: <453D1694.1010203@turner.com> <453D220E.3000706@sharp.fm> <20061024122227.GB3600@redhat.com> Date: Tue, 24 Oct 2006 14:47:09 +0200 (SAST) Subject: Re: mod_disk_cache summarization From: "Graham Leggett" To: dev@httpd.apache.org User-Agent: SquirrelMail/1.4.8-2.el4 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Virus-Scanned: ClamAV using ClamSMTP X-Virus-Checked: Checked by ClamAV on apache.org On Tue, October 24, 2006 2:22 pm, Joe Orton wrote: >> In essence, the patches solve the thundering herd problem. > > I still think it's fundamentally wrong to try to "fix" that problem in > this way. It seems like the cache is being re-implemented to optimize > for some very specific deployment scenarios, which scares me quite a > lot. The thundering herd problem has been well documented since it first appeared as a bug against v1.3's mod_proxy in around 1998 or 1999. As soon as you try and use httpd in any environment with high load, the thundering herd problem bites people. People who run high load sites shouldn't have to write their own cache module or patch httpd before mod_cache works, and that's exactly what's happened here. Twice. The problem is also present in other caches as well, so you get really lame behaviour like X Windows PCs all fetching the same system update at the same time through the same transparent proxy, but the download is done X times over because all the downloads were started before the first download to complete cached the file. Workarounds like fiddling with expiry times help, but they aren't a permanent solution, and they certainly aren't a solution to the scenario above. > IMO: for a general purpose cache it is not appropriate to stop and try > to write the entire response to the cache before serving anything. Correct, that is the next problem to solve. > Neither is it appropriate to have any process do the "sleep and stat" > loop waiting for some other process to finish writing a cache file. Correct, thus a notify API was suggested, which needs to be added to APR. > And > certainly having the cache fork threads/processes so it can internally > cache and serve simultaneously is the most scary idea of all. Correct, which is why it wasn't committed. I am looking for a way for the network filter to do non blocking writes. The notify API will be useful here as well. > The cache can be simple and correct by using the open/O_EXCL logic to > avoid caching the same URL simultaneously in multiple processes. In the > case where the open gives EEXISTS, the cache filters should just get out > of the way and let the resource be served normally. > > I think the only feasible approach to mitigating the "thundering herd" > expiry problem is to use the fuzzy expiry logic Brian described in his > talk: offseting the expiry time by some random offset in each process. If people are rewriting or patching mod_disk_cache, then mod_disk_cache is not correct. Thundering herd is a difficult problem to solve, which is why it simply hasn't been solved in most caches, including ours. But implementing sub-optimal workarounds is not a permanent solution. I would rather be more complex and correct, than simple and broken. Regards, Graham --