httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Slemko <ma...@znep.com>
Subject Re: [SHOWSTOPPER] SIGHUP caused a fatal and silent crash
Date Mon, 03 Feb 1997 07:11:09 GMT
And FreeBSD.  What I see is all the children (a hundred or two) but
(generally) one as zombies waiting for the parent to collect
exit status.  The one process still left is going on happily
servicing the request it is on; I have seen it send several documents
on a keepalive connection, not sure if it gets any more requests after
that.

My suspicions are simply that a signal is being lost somewhere.  If I
send a HUP to the child that is hanging around, it exits and
everything goes on like magic.  That means we either need to find
out why it is being lost or make the parent retry the signal or,
preferrably, both.

I will make a guess that the parent gets stuck in:

void reclaim_child_processes ()
{
    int i, status;
    int my_pid = getpid();

    sync_scoreboard_image();
    for (i = 0; i < HARD_SERVER_LIMIT; ++i) {
        int pid = scoreboard_image->servers[i].pid;

        if (pid != my_pid && pid != 0)
            waitpid (scoreboard_image->servers[i].pid, &status, 0);
    }
}

...in the waitpid().  Lets say an put an alarm() before the waitpid() that
drops into a routine which sends another HUP to the children and
then tries this again.  Some sort of counter would be good so it
could send progressively stronger signals the more it was called, or 
eventually just skip that child and log a warning.

Or is WNOHANG portable enough and useful?  Could perhaps work that in
without needing an ugly signal handler.

On Thu, 30 Jan 1997, Dean Gaudet wrote:

> Ditto on IRIX 5.3 with 1.1.1.  I didn't bother reporting it 'cause
> graceful restart was being worked on.
> 
> Dean
> 
> On Thu, 30 Jan 1997, Ed Korthof wrote:
> 
> > While someone is looking at this -- on Solaris 2.5, on a heavily loaded
> > server, for some reason SIGHUP frequently fails to take down all the
> > children.  The server then hangs with a small number of children, and
> > waits for them to finish MaxRequestsPerClient.  Easily reproducible.
> > 
> >      -- Ed Korthof        |  Web Server Engineer --
> >      -- ed@organic.com    |  Organic Online, Inc --
> >      -- (415) 278-5676    |  Fax: (415) 284-6891 --
> > 
> > On Fri, 31 Jan 1997, Rob Hartill wrote:
> > 
> > > 
> > > My old friends at LANL just reported their 1.2b6 silently died after a
> > > SIGHUP. A SIGHUP later on worked. They'd just upgraded to 1.2b6 from a
> > > b5-dev I think.
> > > 
> > > That with HPUX.
> > > 
> > > 
> > 
> > 
> 


Mime
View raw message