httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Laurie <...@algroup.co.uk>
Subject Re: [PATCH] bringing down the server from an MPM thread
Date Fri, 28 Jul 2000 18:38:54 GMT
dean gaudet wrote:
> 
> On Wed, 26 Jul 2000, Ben Laurie wrote:
> 
> > dean gaudet wrote:
> > >
> > > On Tue, 25 Jul 2000 rbb@covalent.net wrote:
> > >
> > > > This is wrong.  Apache 1.3 never killed off the whole server because a
> > > > child process had problems, at least not that I am aware of.
> > >
> > > it did.
> > >
> > > see http_main.c.  that's what APEXIT_CHILDFATAL means -- the world is
> > > fucked, the child doesn't think things can continue.
> > >
> > >     if ((WIFEXITED(status)) &&
> > >         WEXITSTATUS(status) == APEXIT_CHILDFATAL) {
> > >         ap_log_error(APLOG_MARK, APLOG_ALERT|APLOG_NOERRNO, server_conf,
> > >                         "Child %d returned a Fatal error... \n"
> > >                         "Apache is exiting!",
> > >                         pid);
> > >         exit(APEXIT_CHILDFATAL);
> > >     }
> >
> > That's scary!
> 
> i'm not sure there's an alternative though.
> 
> it could try a cleanup -- as long as it protects itself from nested fatal
> errors and just dies on subsequent fatals.

What scares me most is that a corrupt child could exit with that status
and take out the server. Something a bit more solid would be good.

> at some point you want a human to make a decision, and this would seem to
> be one of them.  it's going to be much better than putting in complex
> decision making foo which is only going to screw up more.

Agreed.

> (there was this time that VCS, veritas cluster server, decided a host was
> down and double-mounted all of its filesystems on the hot spare...
> meanwhile the host was up, but kernel NFS was starving userland so much it
> couldn't heartbeat... massive corruption resulted.  just say no to
> automated failover!  i'd rather staff a 24x7 NOC or spend more effort on
> better monitoring :)

I've been saying no to automated failover for at least two decades now,
and this is a classic example of why!

Cheers,

Ben.

--
http://www.apache-ssl.org/ben.html

Coming to ApacheCon Europe 2000? http://apachecon.com/

Mime
View raw message