Return-Path: Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: (qmail 9390 invoked from network); 19 Aug 2005 10:37:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 19 Aug 2005 10:37:13 -0000 Received: (qmail 83527 invoked by uid 500); 19 Aug 2005 10:37:09 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 83474 invoked by uid 500); 19 Aug 2005 10:37:08 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 83461 invoked by uid 99); 19 Aug 2005 10:37:08 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Aug 2005 03:37:08 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [213.92.5.19] (HELO mid-2.inet.it) (213.92.5.19) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Aug 2005 03:37:26 -0700 Received: from natrm.etnoteam.it [::ffff:194.185.249.130] by mid-2.inet.it via I-SMTP-5.2.3-520 id ::ffff:194.185.249.130+IGqcBr0kA; Fri, 19 Aug 2005 12:37:05 +0200 From: "Sergio Leonardi" To: Subject: RE: [PATCH] mod_disk_cache deterministic tempfiles Date: Fri, 19 Aug 2005 12:39:21 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook, Build 11.0.5510 In-Reply-To: <20050818180947.GA26713@stdlib.net> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1506 thread-index: AcWkIAiXX/jpBZSiQ3O7ckNVVGGhyAAgJhYg X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi all I'm thinking to a possible solution to these problems, let me know if = this makes sense. Please note I may suggest things that are already = implemented because I had no time to extensively study the module yet. Cache module can be splitted in two parts to implement a = producer-consumer model: 1 - mod_cache: should be the "front end" module, the one which accepts requests and serve contents from the cache 2 - mod_cache_requester: should be the "back end" module, the one which accepts requests from mod_cache whenever he needs to regenerate the = pages; it will make use of regeneration timeouts that will be discussed further Up to this point there is no news, let me go deeper. Whenever mod_cache needs a page generation it contacts = mod_cache_requester inserting data needed into a 2-levels priority queue: level 1: this is reserved for requests generated by browsers level 2: this is reserved for requests generated by internal cache regeneration (i.e. because expiration time is approaching) In the data structure used we can put all incoming requests for the same = URL in the same "request block" (in a way simil to file system drivers) just = to keep track of how many processes/threads requested it in order to = contact them whenever content generation has been completed. For each request block: 1 - mark this request block as "generating" instead of "waiting for generation" 2 - take the first highly prioritized request for this request block 3 - use a configurable default overall generation timeout and a default block-by-block generation timeout (this can lead to a longer timeout if content data bytes is continuing to be served by the back end); this = latter is useful for "heavy" content 4 - perform the request to the back end using the above default timeouts 5 - back end response header can contain some "special" variable that = can be used to update above timeout values before serving content data (in = order to allow specific pages to have different timeout values, up to back end programmer/administrator); if it contains, set these values and remove special variables from the response header 6 - put the content in the cache 7 - contact each thread/process of mod_cache telling that content is = ready to be served If timeout occurs between steps 4 - 6: 1 - contact the only thread/process that generated current request of = this request block (the one selected in step 2) giving up 2 - remove this request from this request block=20 3 - if request block is empty remove it from the queue and exit 4 - if request block is not empty mark the request block as "waiting for generation", giving to other requests of the same block a chance to be generated NOTE 1: Obviously by default level 1 is processed "before" level 2 (or = they can use 2 separate connection pools to the back end, but the first one should be greater). NOTE 2: If a request is generated at level 2 and is not served before content expires, it can happen that a browser asks for the content and generates a level 1 request for the content. This can be managed by "escalating" the original request from level 2 to level 1 (using = browser's request data instead of the ones used by internal regeneration).=20 What do you think about it? It could be a way to limit requests to the back end in terms of: - number of requests: avoiding back end saturation during load peaks - duplicate requests: avoiding back end saturation during regeneration = time Please forget it if I told something OT. Bye Sergio -----Original Message----- From: Colm MacCarthaigh [mailto:colm@stdlib.net]=20 Sent: gioved=EC 18 agosto 2005 20.10 To: dev@httpd.apache.org Subject: Re: [PATCH] mod_disk_cache deterministic tempfiles On Thu, Aug 18, 2005 at 02:00:52PM -0400, Brian Akins wrote: > Colm MacCarthaigh wrote: >=20 > >So mtime not being recent is no-indication of death, it could easily = be > >a trickling download. >=20 > True. But, if the files mtime has not changed in 120 seconds (for=20 > example) the download is probably hung? 120 second stalls arn't uncommon, and there are plenty of overloaded servers and terrible CGI's that have those kind of response times. A major use of mod_cache to solve just that problem, but any approach - no matter what number of seconds you pick - will always introduce inefficiency. If you pick a value that is too low, you'll never cache slow-to-serve content. If you pick a value that is too high, you'll end up sending a lot of requests to the (already slow) backend. There might be a solution in using the scoreboard. --=20 Colm MacC=E1rthaigh Public Key: = colm+pgp@stdlib.net