httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul J. Reder" <rede...@raleigh.ibm.com>
Subject Problem found in perform_idle_server_maintenance on prefork.
Date Tue, 27 Feb 2001 21:38:00 GMT
I ran a particularly abusive set of tests which Apache held up beautifully
through, but found a problem in the end.

After the run, the number of children did not return all the way back to the
steady state number. I have been able to reproduce the problem.

What seems to happen is that perform_idle_server_maintenance always picks the
highest child pid value to kill. In the problematic case the highest child is
just starting. Apache tries to kill this child (kill SIGWINCH). The child
tries to die but ends up in the following state:

#0  0x400e458b in __sigsuspend (set=0xbffff8ec) at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
#1  0x400bd211 in __pthread_lock (lock=0x80c8b1c, self=0x400c36c0) at restart.h:32
#2  0x400ba99a in __pthread_mutex_lock (mutex=0x80c8b0c) at mutex.c:84
#3  0x80b1b56 in apr_unix_lock_intra (lock=0x80c8aec) at intraproc.c:120
#4  0x80af22c in apr_lock_acquire (lock=0x80c8aec) at locks.c:113
#5  0x80a0e1b in apr_pool_destroy (a=0x80eebec) at apr_pools.c:761
#6  0x8069d97 in clean_child_exit (code=0) at prefork.c:233
#7  0x8069f82 in just_die (sig=28) at prefork.c:354
#8  0x400bc522 in pthread_sighandler (signo=28, ctx={gs = 0, __gsh = 0, fs = 0, __fsh = 0,
es = 43, __esh = 0, ds = 43,
      __dsh = 0, edi = 3221224724, esi = 1073784336, ebp = 3221224232, esp = 3221224220, ebx
= 197, edx = 135416640,
      ecx = 135424848, eax = 135416640, trapno = 16, err = 0, eip = 134875327, cs = 35, __csh
= 0, eflags = 66183,
      esp_at_signal = 3221224220, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 2147483648,
cr2 = 0}) at signals.c:91
#9  0x400e4408 in __restore () at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127
#10 0x80a0985 in apr_pool_sub_make (p=0x80eebec, apr_abort=0) at apr_pools.c:481
#11 0x80a0a48 in apr_pool_create (newcont=0xbffffb78, cont=0x80eebec) at apr_pools.c:545
#12 0x806a447 in child_main (child_num_arg=196) at prefork.c:543
#13 0x806a9ad in make_child (s=0x80cefd4, slot=196) at prefork.c:869
#14 0x806ac71 in perform_idle_server_maintenance () at prefork.c:1010
#15 0x806b048 in ap_mpm_run (_pconf=0x80ceaec, plog=0x80eabcc, s=0x80cefd4) at prefork.c:1180
#16 0x807073c in main (argc=3, argv=0xbffffd14) at main.c:428

Apparently the child was in the midst of creating the pool when it got the interrupt. The
cleanup can't get the lock so the child is deadlocked with itself. Apache tries to kill this
child every second and fails. None of the other kids get cleaned up. The scoreboard lists
this
child in the Starting state.

If I kill this child, then idle maintenance continues as normal killing kids down to the steady
state number.

Possible fixes:
---------------
  1) Don't allow perform_idle_server_maintenance to clean up children in the Starting state.
        Kill the highest numbered child not in the S state.
  2) Look into closing the timing window that allows this to happen. Continue allowing state
        S children to be killed.
  3) Keep track of the last pid killed. If it is the same pid N times in a row, change its
        state to Zombie and kill the next pid down. Log the presence of a zombie for debugging
        purposes. Don't try to kill Zombies.
  4) Something else more appropriate?
     
-- 
Paul J. Reder
-----------------------------------------------------------
"The strength of the Constitution lies entirely in the determination of each
citizen to defend it.  Only if every single citizen feels duty bound to do
his share in this defense are the constitutional rights secure."
-- Albert Einstein

Mime
View raw message