httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Trawick <traw...@gmail.com>
Subject Re: [PATCH] fix child reclaim timing
Date Fri, 13 Aug 2004 22:25:32 GMT
On Fri, 13 Aug 2004 16:48:42 -0400, Arliss, Noah <narliss@netegrity.com> wrote:
> I'd like to comment further... Not only is a disturbing message sent to the
> error log, but a SIGTERM is also sent to the child process. If I understand
> correctly the SIGTERM will likely interrupt any properly implemented child
> process shutdown and the child process will exit ungracefully.

for worker MPM, at least:

child processes have a SIGTERM handler that simply sets a flag and
returns to whatever was happening before; it will be the main thread
of a child that receives a message via another mechanism which tells
it to wake up and decide to exit

the SIGTERM isn't expected to interrupt any important processing going
on in the child (be it worker threads or child exit hook)

SIGTERM is sent multiple times to work around any signal loss or other
glitch (not sure when this is effective in reality); I don't see how
it is harmful to any code that must run

the SIGKILL is what yanks the rug out from under the child and any
child exit hooks; the web server simply must exit in a reasonable
timeframe if the administrator tells it too, stuck code or not

>                   If it's
> acceptable to wait longer then the kill call should also be postponed to
> give modules a chance to cleanup gracefully. If any module has complex IPC
> or Mutexes in use, graceful shutdown is important especially if
> MaxRequestsPerChild is in use on a server with heavy load.

yes, the SIGKILL is the measure of last resort; shouldn't be sent for
a while after we start shutting down

here is a current example:

(I don't actually know when shutdown started; I should add a debug msg
there; but it is very short time before this uninteresting mess
starts)
[Mon Jun 14 09:15:11 2004] [warn] child process 3906 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:12 2004] [warn] child process 3907 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:12 2004] [warn] child process 3924 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:12 2004] [warn] child process 3925 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:12 2004] [warn] child process 3926 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:12 2004] [warn] child process 3906 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:12 2004] [warn] child process 3907 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:12 2004] [warn] child process 3924 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:12 2004] [warn] child process 3925 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:12 2004] [warn] child process 3926 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:13 2004] [warn] child process 3906 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:13 2004] [warn] child process 3907 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:13 2004] [warn] child process 3924 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:13 2004] [warn] child process 3925 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:13 2004] [warn] child process 3926 still did not
exit, sending a SIGTERM
[Mon Jun 14 09:15:14 2004] [error] [client 127.0.0.1] request failed:
error reading the headers
[Mon Jun 14 09:15:15 2004] [info] (9)Bad file number:
core_output_filter: writing data to the network
[Mon Jun 14 09:15:17 2004] [error] child process 3906 still did not
exit, sending a SIGKILL
[Mon Jun 14 09:15:34 2004] [info] removed PID file
/export/home/trawick/inst/20/logs/httpd.pid (pid=3903)
[Mon Jun 14 09:15:34 2004] [notice] caught SIGTERM, shutting down

if SIGTERM simply sets a flag and returns, what is use of repeating
the SIGTERM over and over?  for worker MPM it doesn't help or hurt
AFAICT; worker does something else to wake up its children prior to
calling the code Joe has a patch for

this sounds a bit more sane to me for timing, as long as we can exit
as soon as all children have exited:

shutdown + 0:
  send SIGTERM
shutdown + 4:
  for each child still remaining, bitch to error log and send SIGTERM again
shutdown + 8:
  for each child still remaining, bitch to error log and send SIGTERM again
shutdown + 12:
  for each child still remaining, bitch to error log and send SIGKILL
shutdown + 16:
  for each child still remaining, bitch to error log, send SIGKILL again, 
    and exit anyway

if somebody suspects that sending SIGTERM every second is going to
help some MPM+platform, that would be great to know

Mime
View raw message