httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Ames <grega...@remulak.net>
Subject [PATCH] PR 34514 recover from transient thread create failures
Date Tue, 03 May 2005 19:12:47 GMT

http://people.apache.org/~gregames/thread_create_recovery.patch

design:

* exit with APEXIT_CHILDSICK for thread create failures (same as other patches)
* add logic to the parent to decide how bad these errors really are.  if we
can't initialize a single worker process, just give up.  otherwise treat these
as transient errors.
* nuke the 10 sec delays in the exiting children.  it's better to let the parent 
know what's happening right away.  if the parent creates a fork-a-thon (not seen 
in my testing), the parent is broken.  there is existing logic to minimize the 
fork rate for APEXIT_CHILDSICK.

Greg

----------------------------------------------------------------------------

gory details of the failure I looked at:

httpd went down due to an error in a new child.  the problem child logged 
"[alert] (12)Not enough space: apr_thread_create: unable to create listener 
thread",
the parent logged "[alert] Child 11700 returned a Fatal error... Apache is 
exiting!"
and then it shut down the whole server.  there was no prior indication of a 
problem with the older children.

my guess is that the kernel was temporarily out of some memory resource needed
to create a thread.  fwiw, this was on HP-UX 11.0.  but I'm really not
interested in those details.  I'd rather make httpd more robust when this type
of situation occurs.



Mime
View raw message