httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Trawick <traw...@gmail.com>
Subject Re: [PATCH] fix child reclaim timing
Date Fri, 13 Aug 2004 14:27:11 GMT
On Fri, 13 Aug 2004 14:51:23 +0100, Joe Orton <jorton@redhat.com> wrote:
> The 2.0 ap_reclaim_child_processes logic seems to be broken - it never
> resets the waittime variable as it did in 1.3; so the parent will wait
> for up to 23 minutes (sic) in total for a stuck child process.  (SIGSTOP
> a child and strace the parent to see for yourself)
> 
> This updates the logic to be a little more sane:
> 
> - at t + 16, 82, 344 ms, just waitpid()
> - at t + 425, 688, 1736 ms, waitpid() else SIGTERM the child
> - at t + 1.74 secs, waitpid() else SIGKILL the child
> - at t + 1.75, 1.82 secs, just waitpid()
> - at t + 2.08 secs, waitpid() else log "this child won't die"
> 
> Any comments?

Here is my take on what is wrong with current code:

1) It starts complaining a bit too soon.  Some third-party modules
have rather complicated child exit strategies.  Whether or not that is
good or bad (bad ;) ), it results in disturbing messages that wouldn't
have appeared if we were a little more patient (2-3 seconds).  Also, I
suspect that the use of threaded MPM affects how quickly the children
are exiting now on Unix.

2) It should never stop checking for exited processes less often than
1-2 seconds, even if it doesn't complain to error log that often. 
Like you say, current code can wait a VERY long time for child
processes to exit.  In practice, I see that it can wait a VERY long
time even after the last child has exited.

I'll agree that it should never wait so long, though I think around 15
or so seconds total is reasonable.  Exiting before children are gone
doesn't let Apache start up any more quickly; it just prevents
potentially-useful information about timing from getting logged to the
error log.

--/--

I wouldn't complain to error log at all until it has been 2 seconds,
and then I'd still wait around for 10-15 more.  But it has to check
every second so it finds out soon after all children have exited and
doesn't sleep needlessly.

Mime
View raw message