httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "GUMMALAM,MOHAN (HP-Cupertino,ex2)" <>
Subject [PATCH] Problems with MPM threaded
Date Sat, 14 Jul 2001 04:42:29 GMT
I propose the following patch [PATCH A]: It will partially fix the unwanted
child deaths problem (symptoms mentioned in the mails included below).  It
fixes the problem by making sure that perform_idle_server_maintenance() does
not count the threads of the process that recd the POD, in the calculation
of idle_thread_count.  To do that I have used
ap_scoreboard_image->parent[process_slot]->process_status field.  I am
temporarily using an already defined value, SB_IDLE_DIE.  If the general
idea is acceptable, I can work on solidifying the details.  PATCH A is
attached below.

However this patch exposes another problem in the code - by this new fix,
although the untargetted childs do not get a POD, the targetted child
process does not die immediately either.  Here is why that happens:  In the
worker_thread() routine in threaded.c, at the instant when worker 1.0
(represented in the <process_slot>.<thread_slot> format, i.e, 1 is the
process_slot and 0 is the thread_slot) gets the POD, the remaining threads
of process 1 are all waiting at the apr_lock_acquire(accept_mutex).  If the
web-server is really idle, the chances are slim that all the remaining
worker threads for process 1 would acquire the lock in a very short time.
As an effect, the remaining worker threads of process 1 do not die
immediately.  To resolve this:

SOLUTION 1:  I plan to temporarily implement a new
apr_lock_acquire_timeout() function, which would cause the threads waiting
on the mutex to give-up after sometime.  The piece of code would look
something like this:

while ((rv = SAFE_ACCEPT(apr_lock_acquire_timeout(accept_mutex, timevalue)))
	if (check_if_timer_popped)
		if (workers_may_exit)
	else {                         /* apr_lock_acquire failed */
		workers_may_exit = 1;

I know that this would cause some performance impact, but my guess is that
it would not be a lot, especially if we keep the timevalue reasonably high.
However, in order to get the functionality of
perform_idle_server_maintenance() working right, we _will_ have to implement
the above solution (or maybe something similar)!

SOLUTION 2: One could use a slightly more _involved_ approach, where the
dying thread could send a signal to its sibling threads, each of which will
then handle that signal with a graceful exit.

SOLUTION 3: We could turn our heads the other way by not worrying about this
situation at all, since these threads would eventually die (when more
requests are handled by the webserver).  However, by ignoring the problem,
the purpose of perform_idle_server_maintenance() would be lost!!

Please respond with your thought, and if there are no objections, I will go
ahead and post a patch for SOLUTION 1 sometime soon.


************************* Start PATCH A ********************************
--- server/mpm/threaded/threaded.c.orig Tue Jul  3 06:58:10 2001
+++ server/mpm/threaded/threaded.c      Fri Jul 13 18:28:44 2001
@@ -495,7 +495,7 @@

 /* Sets workers_may_exit if we received a character on the pipe_of_death */
-static void check_pipe_of_death(void)
+static void check_pipe_of_death(int process_slot)
     if (!workers_may_exit) {
@@ -511,6 +511,7 @@
         else {
             /* It won the lottery (or something else is very
              * wrong). Embrace death with open arms. */
+           ap_scoreboard_image->parent[process_slot].process_status =
             workers_may_exit = 1;
@@ -584,7 +585,7 @@
             if (event & APR_POLLIN) {
                 /* A process got a signal on the shutdown pipe. Check if
                  * the lucky process to die. */
-                check_pipe_of_death();
+                check_pipe_of_death(process_slot);

@@ -972,6 +973,9 @@
        int status = SERVER_DEAD;
        int any_dying_threads = 0;
        int any_dead_threads = 0;

+       if (ap_scoreboard_image->parent[i].process_status == SB_IDLE_DIE)
+           continue;

        if (i >= ap_max_daemons_limit && free_length == idle_spawn_rate)
************************* End PATCH A ********************************

-----Original Message-----
From: GUMMALAM,MOHAN (HP-Cupertino,ex2) []
Sent: Tuesday, July 10, 2001 6:54 PM
To: ''
Subject: RE: Problems, 2.0.20, and HP-UX

My httpd.conf file is as follows:

<IfModule threaded.c>
StartServers         6
MaxClients           8
MinSpareThreads      5
MaxSpareThreads     125
ThreadsPerChild     25
MaxRequestsPerChild  0

If I set the MaxSpareThreads to 150 (which is equal to StartServers *
ThreadsPerChild), then everything works fine..  but whenever I set
MaxSpareThreads to 125 (or any value below that, say 100), I end up with
just two processes!  It takes a while (like couple of minutes) for this to
take effect.  Anyone seen this behavior yet?  Any clues?


-----Original Message-----
From: GUMMALAM,MOHAN (HP-Cupertino,ex2) []
Sent: Tuesday, July 10, 2001 4:58 PM
To: ''
Subject: Problems, 2.0.20, and HP-UX

Hi Apache2.0 devs: On HP-UX, we are facing a problem whose symptoms are
similar to the one seen with 2.0.19 release.  Upon starting Apache, the
number of processes running reduces to 2.  Upon further investigation, it
looked like the main process (watchdog process) went on a rampage and killed
all the child processes in the following piece of code:

static void perform_idle_server_maintenance(void)
    if (idle_thread_count > max_spare_threads) {
        /* Kill of one child */

By using tusc, we found that after the child processes were created, there
were a bunch of requests sent to kill the processes:

write(13, "! ", 1) ....................................... = 1

And although a lot of waitpid's were called earlier (each returning -1), the
following bunch of waitpid's succeeded after a while, all occuring together.

waitpid(-1, WIFEXITED(0), WNOHANG|WUNTRACED) ............. = 2122
waitpid(-1, WIFEXITED(0), WNOHANG|WUNTRACED) ............. = 2121
waitpid(-1, WIFEXITED(0), WNOHANG|WUNTRACED) ............. = 2120
waitpid(-1, WIFEXITED(0), WNOHANG|WUNTRACED) ............. = 2119
waitpid(-1, WIFEXITED(0), WNOHANG|WUNTRACED) ............. = 2118
waitpid(-1, WIFEXITED(0), WNOHANG|WUNTRACED) ............. = 2117

At the end of it, there are only two processes running.  I tried this out
with different StartServers values, (4, 5 and 6).  Each time I ended up with
the same final state (of two processes).  The above output is in the case of

I'm surprised that no one else has run into this, when it consistently shows
up on HP-UX.

We'll be investigating this further here, and will get back to you.  If
anyone else has run into this problem, do let me know.


View raw message