Return-Path: Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: (qmail 36679 invoked from network); 13 Aug 2004 13:51:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 13 Aug 2004 13:51:40 -0000 Received: (qmail 18958 invoked by uid 500); 13 Aug 2004 13:51:32 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 18895 invoked by uid 500); 13 Aug 2004 13:51:31 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 18855 invoked by uid 99); 13 Aug 2004 13:51:30 -0000 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=SEE_FOR_YOURSELF,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received: from [66.187.233.31] (HELO mx1.redhat.com) (66.187.233.31) by apache.org (qpsmtpd/0.27.1) with ESMTP; Fri, 13 Aug 2004 06:51:27 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.10/8.12.10) with ESMTP id i7DDpQe1010072 for ; Fri, 13 Aug 2004 09:51:26 -0400 Received: from radish.cambridge.redhat.com (radish.cambridge.redhat.com [172.16.18.90]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id i7DDpPa03613 for ; Fri, 13 Aug 2004 09:51:25 -0400 Received: from radish.cambridge.redhat.com (localhost.localdomain [127.0.0.1]) by radish.cambridge.redhat.com (8.12.10/8.12.7) with ESMTP id i7DDpOHP018103 for ; Fri, 13 Aug 2004 14:51:24 +0100 Received: (from jorton@localhost) by radish.cambridge.redhat.com (8.12.10/8.12.10/Submit) id i7DDpNRs018102 for dev@httpd.apache.org; Fri, 13 Aug 2004 14:51:24 +0100 Date: Fri, 13 Aug 2004 14:51:23 +0100 From: Joe Orton To: dev@httpd.apache.org Subject: [PATCH] fix child reclaim timing Message-ID: <20040813135123.GA18095@redhat.com> Mail-Followup-To: dev@httpd.apache.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.4.1i X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N The 2.0 ap_reclaim_child_processes logic seems to be broken - it never resets the waittime variable as it did in 1.3; so the parent will wait for up to 23 minutes (sic) in total for a stuck child process. (SIGSTOP a child and strace the parent to see for yourself) This updates the logic to be a little more sane: - at t + 16, 82, 344 ms, just waitpid() - at t + 425, 688, 1736 ms, waitpid() else SIGTERM the child - at t + 1.74 secs, waitpid() else SIGKILL the child - at t + 1.75, 1.82 secs, just waitpid() - at t + 2.08 secs, waitpid() else log "this child won't die" Any comments? Index: mpm_common.c =================================================================== RCS file: /home/cvs/httpd-2.0/server/mpm_common.c,v retrieving revision 1.120 diff -u -r1.120 mpm_common.c --- mpm_common.c 15 Mar 2004 23:08:41 -0000 1.120 +++ mpm_common.c 13 Aug 2004 13:42:47 -0000 @@ -70,7 +70,7 @@ ap_mpm_query(AP_MPMQ_MAX_DAEMON_USED, &max_daemons); - for (tries = terminate ? 4 : 1; tries <= 9; ++tries) { + for (tries = terminate ? 4 : 1; tries <= 10; ++tries) { /* don't want to hold up progress any more than * necessary, but we need to allow children a few moments to exit. * Set delay with an exponential backoff. @@ -98,13 +98,15 @@ switch (tries) { case 1: /* 16ms */ case 2: /* 82ms */ + break; + case 3: /* 344ms */ - case 4: /* 16ms */ + waittime = 16 * 1024; break; - - case 5: /* 82ms */ - case 6: /* 344ms */ - case 7: /* 1.4sec */ + + case 4: /* 360ms */ + case 5: /* 425ms */ + case 6: /* 688ms */ /* ok, now it's being annoying */ ap_log_error(APLOG_MARK, APLOG_WARNING, 0, ap_server_conf, @@ -114,7 +116,7 @@ kill(pid, SIGTERM); break; - case 8: /* 6 sec */ + case 7: /* 1.74 sec */ /* die child scum */ ap_log_error(APLOG_MARK, APLOG_ERR, 0, ap_server_conf, @@ -132,9 +134,14 @@ */ kill_thread(pid); #endif + waittime = 16 * 1024; + break; + + case 8: /* 1.75 sec */ + case 9: /* 1.82 sec */ break; - case 9: /* 14 sec */ + case 10: /* 2.08 secs */ /* gave it our best shot, but alas... If this really * is a child we are trying to kill and it really hasn't * exited, we will likely fail to bind to the port