httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject Re: cvs commit: httpd-2.0/server/mpm/threaded threaded.c
Date Tue, 03 Jul 2001 14:05:00 GMT

Okay,  I have just committed another change that should fix this.  More
information below.

> After letting MRPC = 0 version run for an hour I did a SIG_WINCH and finally
> got over the 500 worker boundary. In fact, since all 500 workers were busy
> at the restart they were all left in place and another 500 started (now there
> are 1000 busy workers). After the initial near death experience on the server
> it resumed normal operation. It ran with 1000 busy workers for almost 30 minutes
> before I SIG_WINCHed it again. This time it ran up to about 1350 total workers
> and eventually, after about another 30 minutes, settled down to about 1225 workers.

This was my fault.  My original patch looked like it solved the problem,
and based on the great research Bill did, I understood why it was needed.
Unfortunately, it attacked a symptom, not the real problem.  The real
problem was the way we were counting idle_thread_count.  That has been
fixed now, and the threaded MPM does now create and kill child processes

> After about another 10 minutes I started killing copies of b. Eventually I
> killed all of the bs and never saw any workers get idle cleaned up.

How did you kill of b's?

> In addition, the error_log had a number of "long lost child came home!" after
> the first SIG_WINCH and after the second SIG_WINCH I found a number of
> "child pid XXXXX exit signal Segmentation fault (11)" (some of which resulted
> in "long lost child came home!" messages - apparently they came home in a
> body bag).

The "long lost child came home" messages are an unfortunate side effect,
and an easy one to solve.  Basically, because we are replacing one child
with another before the parent's wait_for_child finds the one we replaced,
we lose that pid from the scoreboard, and we get this message.  The easy
solution, is to have a single dimensional array in the parent that keeps
track of all child processes created, and watches for them to die.  The
reason to separate this from the scoreboard, is that the parent doesn't
need this information in shared memory.  The other solution, is to move
the pid to the worker_score.  Either one works, but I am not likely to
implement either until after we tag and roll.  This is not a fatal error

As for the seg faults.  I believe those seg faults are caused by something
other than the MPM.  Take a look at the code I modified, none of it runs
in the child process.  Because I believe this is another problem, I am not
going to try to fix it right now, we can attack these later, once the MPM
is stable.  Or somebody can attack them now, but it won't be me.  :-)

> Eventually I received a "Child XXXX returned a fatal error... Apache is exiting!"
> message, so all of the top level processes are now owned by pid 1. Apache
> continued serving pages for another hour until I killed it.

This is a problem with the way Apache exits today.  Basically, whenever
Apache exits like this, it should first send a signal to the child
processes.  Today, it doesn't do that.  If the parent decides to die, we
need to be sure the children die too.

> Also, the first SIG_WINCH doubled the memory footprint of Apache from 3.5 MB
> (in the first hour it grew from a little under 2 MB to a about 3.5) to almost
> 8 MB. Immediately after the second SIG_WINCH the size grew to 12 MB then
> quickly fell back to 8 MB, then continued to grow at a slow rate until I killed it.

This is obviously a memory leak someplace.  I don't know where right now,
and I am not going looking for it.  Again, I didn't allocate any memory in
my patch, I just changed how we interpret the information we have.


Ryan Bloom               
Covalent Technologies

View raw message