httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Querna <c...@force-elite.com>
Subject cgi: KILL_AFTER_TIMEOUT vs KILL_ALWAYS
Date Tue, 24 Oct 2006 16:38:07 GMT
When creating a subprocess (ie a CGI in this case), APR has several 
choices on how to clean it up, when the parent process is exiting or 
running a pool cleanup.  Using apr_pool_note_subprocess, the choices are:
  APR_KILL_NEVER         -- process is never sent any signals
  APR_KILL_ALWAYS        -- process is sent SIGKILL on apr_pool_t cleanup
  APR_KILL_AFTER_TIMEOUT -- SIGTERM, wait 3 seconds, SIGKILL
  APR_JUST_WAIT          -- wait forever for the process to complete
  APR_KILL_ONLY_ONCE     -- send SIGTERM and then wait

Currently, mod_cgi{d} sets APR_KILL_AFTER_TIMEOUT.  It appears however, 
on a server under high load (>100 load average), it is possible for only 
the initial SIGTERM to be sent.  It appears that the SIGKILL was never 
sent to the child CGI processes.

I believe that the parent process, which is supposed to have a 7 second 
space between its own SIGTERM and SIGKILL, is getting the SIGKILL before 
it has slept for 3 seconds *and* sent the final SIGKILL to the child 
CGIs.  This has the potential to leave broken child processes behind.

In the real world, this didn't happen on just a single machine... but 
was widespread over a cluster of machines.  About 70% of them had this 
problem after an `apachectl stop`.

Attached is a patch to mod_cgi and mod_cgid, which switches it to using 
APR_KILL_ALWAYS, which appears to resolve this issue.

Is there anything seriously wrong with using SIGKILL first?

Thanks,

Paul

Mime
View raw message