httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul J. Reder" <rede...@raleigh.ibm.com>
Subject Death by threads.
Date Wed, 04 Apr 2001 22:13:53 GMT
I have been working to apply the idle worker cleanup patch to the threaded
mpm, but have run into what I believe is the problem that Bill Stoddard
ran into a while back. I first thought it was my patched code, but after
much debugging I went back to a virgin CVS build as of this afternoon and
it still happens.

I build a threaded version (using the config.nice from the apache.org build)
and start it up on one server. I then run 14 copies of Jeff Trawick's "b" 
test program on 7 different client machines (2 "b"s each). Each "b" runs with
"-c 100". I am replaying an actual apache.org access log (2.5 million
requests long).

After about 4 1/4 minutes Apache stops serving pages. Taking a look at the server
machine shows that the initial server is still running fine, but all of the
worker threads are either defunct or are now owned by pid=1. There is an entry
about 3 1/2 minutes into the error_log indicating that a segfault occurred:
"...[notice] child pid 21752 exit signal Segmentation fault (11)"
but there is no core file (I have had core files generated during other
builds/runs for other reasons). The only other unusual entries (3 in all at
about 2 1/2 minutes into the run) in the error log are:
"...[warn] (9)Bad file descriptor: setsockopt: (TCP_NODELAY)"

strace shows that the main thread is doing the normal wait/select processing, 
one other thread is sitting at a read, and none of the other threads show 
anything (the defunct ones exit strace immediately). The exact numbers vary
slightly from one run to the next, but this is the general pattern.

I can sometimes get server-status during this time. Server-status shows that
there are a bunch (anywhere from 120 to 320) of workers many of which are
processing requests (in "W" and "R" state) some are in keepalive and some
are idle. The pids of the workers that have reverted to owner pid=1 are all
stuck in the scoreboard in whatever state they were last in ("W", "R", or "K").

Any requests that are able to be processed are handled by the workers that are
still functioning (until there are no more functioning workers). Because the
workers are not dead, they are not cleaned up and replaced.

The only other tidbit is that I am running rotatelogs to keep from overflowing
the partition. This is all under RedHat Linux 6.1.

Has anyone seen this? Is anyone else running the threaded mpm? I am in the
process of setting up the environment to run strace against the workers so that
I can see what they are doing when they die. Any other ideas?

-- 
Paul J. Reder
-----------------------------------------------------------
"The strength of the Constitution lies entirely in the determination of each
citizen to defend it.  Only if every single citizen feels duty bound to do
his share in this defense are the constitutional rights secure."
-- Albert Einstein

Mime
View raw message