httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roy T. Fielding" <field...@kiwi.ics.uci.edu>
Subject Re: night of the dead Apache
Date Sat, 01 Nov 1997 03:55:26 GMT
Is it possible that the parent is killing off the only child
that has the mutex lock, and that killing it doesn't free the lock?

I was able to truss all five processes (parent and four children)
and after the big sequence four children end in

alarm(15)                                       = 0
read(3, 0x00094928, 4096)       (sleeping...)
signotifywait()                                 = 14
    Received signal #14, SIGALRM, in read() [caught]
lwp_sigredirect(1, SIGALRM)                     = 0
read(3, 0x00094928, 4096)                       Err#4 EINTR
sigprocmask(SIG_SETMASK, 0xEF5659D4, 0x00000000) = 0
sigaction(SIGPIPE, 0xEFFFA2F0, 0xEFFFA3F0)      = 0
close(3)                                        = 0
getcontext(0xEFFFA278)
setcontext(0xEFFFA278)
sigaction(SIGURG, 0xEFFFEB88, 0xEFFFEC88)       = 0
sigaction(SIGPIPE, 0xEFFFEB88, 0xEFFFEC88)      = 0
sigaction(SIGALRM, 0xEFFFEB88, 0xEFFFEC88)      = 0
sigaction(SIGUSR1, 0xEFFFEB88, 0xEFFFEC88)      = 0
alarm(0)                                        = 0
time()                                          = 878355616
lwp_mutex_lock(0xEF750000)      (sleeping...)


which indicates the keepalive timeout triggered, socket closed, and
this child is now waiting for the mutex.  Then, a few seconds later,
the parent does

time()                                          = 878355616
waitid(P_ALL, 0, 0xEFFFED58, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFCDD8, 0, 1000)                       = 0
time()                                          = 878355617
kill(27381, SIGUSR1)                            = 0
waitid(P_ALL, 0, 0xEFFFED58, WEXITED|WTRAPPED|WNOHANG) = 0
signotifywait()                                 = 18
poll(0xEFFFCDD8, 0, 1000)                       = 0
time()                                          = 878355618
kill(27381, SIGUSR1)                            = 0
waitid(P_ALL, 0, 0xEFFFED58, WEXITED|WTRAPPED|WNOHANG) = 0
time()                                          = 878355618
waitid(P_ALL, 0, 0xEFFFED58, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFCDD8, 0, 1000)                       = 0
time()                                          = 878355619
kill(27380, SIGUSR1)                            = 0
waitid(P_ALL, 0, 0xEFFFED58, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFCDD8, 0, 1000)                       = 0
time()                                          = 878355620
kill(27380, SIGUSR1)                            = 0
waitid(P_ALL, 0, 0xEFFFED58, WEXITED|WTRAPPED|WNOHANG) = 0
time()                                          = 878355620
waitid(P_ALL, 0, 0xEFFFED58, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFCDD8, 0, 1000)                       = 0
time()                                          = 878355621


which kills off the two children that were created during the
big sequence (not trussed).  One of those two had the mutex.
Hmmm, time to test USE_PTHREAD_SERIALIZED_ACCEPT.

.....Roy

Mime
View raw message