httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Orton <jor...@redhat.com>
Subject [PATCH] fix child reclaim timing
Date Fri, 13 Aug 2004 13:51:23 GMT
The 2.0 ap_reclaim_child_processes logic seems to be broken - it never
resets the waittime variable as it did in 1.3; so the parent will wait
for up to 23 minutes (sic) in total for a stuck child process.  (SIGSTOP
a child and strace the parent to see for yourself)

This updates the logic to be a little more sane:

- at t + 16, 82, 344 ms, just waitpid()
- at t + 425, 688, 1736 ms, waitpid() else SIGTERM the child
- at t + 1.74 secs, waitpid() else SIGKILL the child
- at t + 1.75, 1.82 secs, just waitpid()
- at t + 2.08 secs, waitpid() else log "this child won't die" 

Any comments?

Index: mpm_common.c
===================================================================
RCS file: /home/cvs/httpd-2.0/server/mpm_common.c,v
retrieving revision 1.120
diff -u -r1.120 mpm_common.c
--- mpm_common.c	15 Mar 2004 23:08:41 -0000	1.120
+++ mpm_common.c	13 Aug 2004 13:42:47 -0000
@@ -70,7 +70,7 @@
 
     ap_mpm_query(AP_MPMQ_MAX_DAEMON_USED, &max_daemons);
 
-    for (tries = terminate ? 4 : 1; tries <= 9; ++tries) {
+    for (tries = terminate ? 4 : 1; tries <= 10; ++tries) {
         /* don't want to hold up progress any more than
          * necessary, but we need to allow children a few moments to exit.
          * Set delay with an exponential backoff.
@@ -98,13 +98,15 @@
             switch (tries) {
             case 1:     /*  16ms */
             case 2:     /*  82ms */
+                break;
+                
             case 3:     /* 344ms */
-            case 4:     /*  16ms */
+                waittime = 16 * 1024;
                 break;
-
-            case 5:     /*  82ms */
-            case 6:     /* 344ms */
-            case 7:     /* 1.4sec */
+                
+            case 4:     /* 360ms */
+            case 5:     /* 425ms */
+            case 6:     /* 688ms */
                 /* ok, now it's being annoying */
                 ap_log_error(APLOG_MARK, APLOG_WARNING,
                              0, ap_server_conf,
@@ -114,7 +116,7 @@
                 kill(pid, SIGTERM);
                 break;
 
-            case 8:     /*  6 sec */
+            case 7:     /*  1.74 sec */
                 /* die child scum */
                 ap_log_error(APLOG_MARK, APLOG_ERR,
                              0, ap_server_conf,
@@ -132,9 +134,14 @@
                  */
                 kill_thread(pid);
 #endif
+                waittime = 16 * 1024;
+                break;
+
+            case 8:     /* 1.75 sec */
+            case 9:     /* 1.82 sec */
                 break;
 
-            case 9:     /* 14 sec */
+            case 10:    /* 2.08 secs */
                 /* gave it our best shot, but alas...  If this really
                  * is a child we are trying to kill and it really hasn't
                  * exited, we will likely fail to bind to the port

Mime
View raw message