httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Behlendorf <br...@hyperreal.com>
Subject Re: WWW Form Bug Report: "Scoreboard show long-since dead 'W' children" on Irix (fwd)
Date Sat, 20 Jul 1996 03:38:44 GMT


---------- Forwarded message ----------
Date: Wed, 17 Jul 1996 18:10:38 -0400
From: Bill Nesbitt <bn@hway.net>
To: ben@algroup.co.uk
Cc: new-httpd@hyperreal.com
Subject: Re: WWW Form Bug Report: "Scoreboard show long-since dead 'W'  children" on Irix
(fwd)

At 04:31 PM 7/17/96 +0100, Ben Laurie wrote:
>Rob Hartill wrote:
>> Message-Id: <199607170433.VAA01965@taz.hyperreal.com>
>> From: bn@hway.net
>> To: apache-bugs%apache.org@organic.com
>> Date: Tue Jul 16 21:33:03 1996
>> Subject: WWW Form Bug Report: "Scoreboard show long-since dead 'W'
children" on Irix
>> 
>> Submitter: bn@hway.net
>> Operating system: Irix, version: 5.3 & 6.2
>> Version of Apache Used: 1.1.1
>> Extra Modules used: 
>> URL exhibiting problem: 
>> 
>> Symptoms:
>> --
>> After the server has been run a while
>> (few hours / several 100,000 hits), the scoreboard
>> starts to show slots occupied by servers reported
>> to be in the "W"rite mode with a lot of
>> "SS".  The processes actually do not exist when
>> checked by the ps command.  After a few days of
>> run time (several million hits) the scoreboard
>> becomes filled with these non-existant "W"
>> processes.  Eventually, the server throttles
>> itself via the MaxClients and fails to accept any
>> more requests.  Any ideas?
>
>This is, of course, impossible. Once a second, Apache does a wait(), and hence
>should be notified of any dead children. The code in this area is fairly tight,
>and I can't really see any possibility of it failing to mark a child dead if
>notified. This leads me to suspect that Apache is never notified. It should
>be simple enough to check this theory by simply logging the child pids and
>checking that the offending children have never been logged:
>
>	int status, child_slot;
>	int pid = wait_or_timeout(&status);
>	
>	if (pid >= 0) {
>	    /* Child died... note that it's gone in the scoreboard. */
>/* ADD THIS (in http_main.c) */
>	    log_printf("Reaping %d",pid);
>	    sync_scoreboard_image();
>	    child_slot = find_child_by_pid (pid);
>	    if (child_slot >= 0)
>		(void)update_child_status (child_slot, SERVER_DEAD,
>		 (request_rec*)NULL);
>        }
>
>if this shows that Apache is not at fault, then the problem is in the kernel.
>It may be possible to work around the problem with a waitpid() - I believe
>someone tried that once with some success.
>
>Let us know what happens.
>
>Cheers,
>
>Ben.
>

Thanks for your quick reply.  You were correct in your diagnosis.  I wrote
this quick workaround.  It seems to work:

***************
*** 841,846 ****
--- 850,878 ----
  
  int wait_or_timeout (int *status)
  {
+     /* NEZ */
+     static int numcall = 0;
+     
+     numcall++;
+     /* 300 - approx. 5 mins */
+     if (numcall > 300) {
+ 	log_error ("Running Reap.", server_conf);
+ 	for (numcall = 0; numcall < HARD_SERVER_LIMIT; numcall++) {
+ 		if (scoreboard_image[numcall].status == SERVER_BUSY_WRITE) {
+ 			if (waitpid(scoreboard_image[numcall].pid, status, WNOHANG) == -1 &&
errno == ECHILD) {
+ 				char errstr[MAX_STRING_LEN];
+ 
+ 				sprintf (errstr, "Reaping slot: %d, pid: %d", numcall,
scoreboard_image[numcall].pid);
+ 				log_error (errstr, server_conf);
+ 				sync_scoreboard_image();
+ 				(void)update_child_status (numcall, SERVER_DEAD, (request_rec*)NULL);
+ 			}
+ 		}
+ 	}
+ 	numcall = 0;
+     }
+     /* */
+ 
      wait_or_timeout_retval = -1;
      
  #if defined(NEXT)

--------------

Thanks again,
-Bill
-----
Bill Nesbitt
bn@hway.net
Hiway Technologies, Inc.
http://www.hway.net/




Mime
View raw message