From: Chuck Murcko <chuck@telebase.com>
Message-Id: <199608211929.PAA09421@telebase.com.>
Subject: Re: irix 5.3 and 1.1.1
To: new-httpd@hyperreal.com
Date: Wed, 21 Aug 1996 15:29:17 -0400 (EDT)
In-Reply-To: <4vafvd$5n6@re.hotwired.com> from "Dean Gaudet" at Aug 19,
 96 07:40:29 pm
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-new-httpd@apache.org
Precedence: bulk
Reply-To: new-httpd@hyperreal.com

Are we getting an unexpected EINVAL or something like it that's messing
up the mutex operation? We just found something like that here with
some Solaris software we'd written.

One possible scenario that could cause this is a bad pointer stepping
on lock_it or unlock_it in the fcntl() calls.

Dean Gaudet liltingly intones:
> 
> I think I'm running into the children-not-dying problem on irix 5.3
> under 1.1.1.  I applied a patch from Ben that I thought dealt with this,
> but it doesn't seem to be working.  Have I missed another related patch?
> I'll include (part of) Ben's patch below for reference (revision numbers
> are mine, not hyperreal's).
> 
> The sympton is:  the machine's load shoots up to 27+ at which point a
> monitoring script I have running cuts in and kills the webserver and
> restarts it.
> 
> Dean
> 
> Index: http_main.c
> ===================================================================
> RCS file: /hot/repository/apache/src/http_main.c,v
> retrieving revision 1.16
> retrieving revision 1.17
> diff -c -r1.16 -r1.17
> *** http_main.c	1996/08/02 07:19:49	1.16
> --- http_main.c	1996/08/02 07:23:10	1.17
> ***************
> *** 845,853 ****
>   #endif
>   }
>   
>   int wait_or_timeout (int *status)
>   {
> !     wait_or_timeout_retval = -1;
>       
>   #if defined(NEXT)
>       if (setjmp(wait_timeout_buf) != 0) {
> --- 845,874 ----
>   #endif
>   }
>   
> + #ifdef BROKEN_WAIT
> + /*
> + Some systems appear to fail to deliver dead children to wait() at times.
> + This sorts them out.
> + */
> + void reap_children()
> +     {
> +     int status,n;
> + 
> +     for(n=0 ; n < HARD_SERVER_LIMIT ; ++n)
> + 	if(scoreboard_image->servers[n].status != SERVER_DEAD
> + 	   && waitpid(scoreboard_image->servers[n].pid,&status,WNOHANG) == -1
> + 	   && errno == ECHILD)
> + 	    {
> + 	    sync_scoreboard_image();
> + 	    update_child_status(n,SERVER_DEAD,NULL);
> + 	    }
> +     }
> + #endif
> + 
>   int wait_or_timeout (int *status)
>   {
> !     int wait_or_timeout_retval = -1;
> !     static int ntimes;
>       
>   #if defined(NEXT)
>       if (setjmp(wait_timeout_buf) != 0) {
> ***************
> *** 857,863 ****
>   	errno = ETIMEDOUT;
>   	return wait_or_timeout_retval;
>       }
> !     
>       signal (SIGALRM, longjmp_out_of_alarm);
>       alarm(1);
>   #if defined(NEXT)
> --- 878,890 ----
>   	errno = ETIMEDOUT;
>   	return wait_or_timeout_retval;
>       }
> ! #ifdef BROKEN_WAIT
> !     if(++ntimes == 60)
> ! 	{
> ! 	reap_children();
> ! 	ntimes=0;
> ! 	}
> ! #endif
>       signal (SIGALRM, longjmp_out_of_alarm);
>       alarm(1);
>   #if defined(NEXT)
> 

chuck
Chuck Murcko	N2K Inc.	Wayne PA	chuck@telebase.com
And now, on a lighter note:
Our OS who art in CPU, UNIX be thy name.
	Thy programs run, thy syscalls done,
	In kernel as it is in user!