Mailing-List: contact new-httpd-help@apache.org; run by ezmlm
Precedence: bulk
Reply-To: new-httpd@apache.org
Sender: rederpj@raleigh.ibm.com
Message-ID: <3B42042C.5E1BF5FC@raleigh.ibm.com>
Date: Tue, 03 Jul 2001 13:43:08 -0400
From: "Paul J. Reder" <rederpj@raleigh.ibm.com>
MIME-Version: 1.0
To: new-httpd@apache.org
Subject: Re: cvs commit: httpd-2.0/server/mpm/threaded threaded.c
References: <Pine.LNX.4.30.0107030654420.11930-100000@koj.rkbloom.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

rbb@covalent.net wrote:
> > After about another 10 minutes I started killing copies of b. Eventually I
> > killed all of the bs and never saw any workers get idle cleaned up.
> 
> How did you kill of b's?

The old fashioned way - ctrl-c.

> 
> > In addition, the error_log had a number of "long lost child came home!" after
> > the first SIG_WINCH and after the second SIG_WINCH I found a number of
> > "child pid XXXXX exit signal Segmentation fault (11)" (some of which resulted
> > in "long lost child came home!" messages - apparently they came home in a
> > body bag).
> 
> The "long lost child came home" messages are an unfortunate side effect,
> and an easy one to solve.  Basically, because we are replacing one child
> with another before the parent's wait_for_child finds the one we replaced,
> we lose that pid from the scoreboard, and we get this message.  The easy
> solution, is to have a single dimensional array in the parent that keeps
> track of all child processes created, and watches for them to die.  The
> reason to separate this from the scoreboard, is that the parent doesn't
> need this information in shared memory.  The other solution, is to move
> the pid to the worker_score.  Either one works, but I am not likely to
> implement either until after we tag and roll.  This is not a fatal error
> IMHO.

Do I misunderstand here? There are quiescing workers whose slot has been
reused? If that is true, doesn't that mean that when the worker finishes
its task and tries to update its slot with request info that it is
overwriting the info for the currently active worker in that same slot?
This seems problematic to me.

> 
> As for the seg faults.  I believe those seg faults are caused by something
> other than the MPM.  Take a look at the code I modified, none of it runs
> in the child process.  Because I believe this is another problem, I am not
> going to try to fix it right now, we can attack these later, once the MPM
> is stable.  Or somebody can attack them now, but it won't be me.  :-)

Understood. I wasn't trying to pin the blame on you, just being complete in
identifying the problems I saw. For example, it is difficult for Apache to
do perform_idle_server_maintenance if there is no server any more, therefore
there won't be any idle cleanup after its gone...

> 
> > Eventually I received a "Child XXXX returned a fatal error... Apache is exiting!"
> > message, so all of the top level processes are now owned by pid 1. Apache
> > continued serving pages for another hour until I killed it.
> 
> This is a problem with the way Apache exits today.  Basically, whenever
> Apache exits like this, it should first send a signal to the child
> processes.  Today, it doesn't do that.  If the parent decides to die, we
> need to be sure the children die too.

Ok, I agree here too. Again, just stating problems, not saying you had to fix it.

> > Also, the first SIG_WINCH doubled the memory footprint of Apache from 3.5 MB
> > (in the first hour it grew from a little under 2 MB to a about 3.5) to almost
> > 8 MB. Immediately after the second SIG_WINCH the size grew to 12 MB then
> > quickly fell back to 8 MB, then continued to grow at a slow rate until I killed it.
> 
> This is obviously a memory leak someplace.  I don't know where right now,
> and I am not going looking for it.  Again, I didn't allocate any memory in
> my patch, I just changed how we interpret the information we have.

I thought this was relatively well known. I was just pointing out that, to my
knowledge, it has never been recommended to run the threaded mpm with MRPC=0
for this reason. Someday these leaks need to be tracked down, if possible.


-- 
Paul J. Reder
-----------------------------------------------------------
"The strength of the Constitution lies entirely in the determination of each
citizen to defend it.  Only if every single citizen feels duty bound to do
his share in this defense are the constitutional rights secure."
-- Albert Einstein