Return-Path: Delivered-To: apmail-new-httpd-archive@apache.org Received: (qmail 39964 invoked by uid 500); 3 Jul 2001 17:44:15 -0000 Mailing-List: contact new-httpd-help@apache.org; run by ezmlm Precedence: bulk Reply-To: new-httpd@apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list new-httpd@apache.org Received: (qmail 39312 invoked from network); 3 Jul 2001 17:44:02 -0000 Sender: rederpj@raleigh.ibm.com Message-ID: <3B42042C.5E1BF5FC@raleigh.ibm.com> Date: Tue, 03 Jul 2001 13:43:08 -0400 From: "Paul J. Reder" X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdksecure i686) X-Accept-Language: en MIME-Version: 1.0 To: new-httpd@apache.org Subject: Re: cvs commit: httpd-2.0/server/mpm/threaded threaded.c References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Rating: h31.sny.collab.net 1.6.2 0/1000/N rbb@covalent.net wrote: > > After about another 10 minutes I started killing copies of b. Eventually I > > killed all of the bs and never saw any workers get idle cleaned up. > > How did you kill of b's? The old fashioned way - ctrl-c. > > > In addition, the error_log had a number of "long lost child came home!" after > > the first SIG_WINCH and after the second SIG_WINCH I found a number of > > "child pid XXXXX exit signal Segmentation fault (11)" (some of which resulted > > in "long lost child came home!" messages - apparently they came home in a > > body bag). > > The "long lost child came home" messages are an unfortunate side effect, > and an easy one to solve. Basically, because we are replacing one child > with another before the parent's wait_for_child finds the one we replaced, > we lose that pid from the scoreboard, and we get this message. The easy > solution, is to have a single dimensional array in the parent that keeps > track of all child processes created, and watches for them to die. The > reason to separate this from the scoreboard, is that the parent doesn't > need this information in shared memory. The other solution, is to move > the pid to the worker_score. Either one works, but I am not likely to > implement either until after we tag and roll. This is not a fatal error > IMHO. Do I misunderstand here? There are quiescing workers whose slot has been reused? If that is true, doesn't that mean that when the worker finishes its task and tries to update its slot with request info that it is overwriting the info for the currently active worker in that same slot? This seems problematic to me. > > As for the seg faults. I believe those seg faults are caused by something > other than the MPM. Take a look at the code I modified, none of it runs > in the child process. Because I believe this is another problem, I am not > going to try to fix it right now, we can attack these later, once the MPM > is stable. Or somebody can attack them now, but it won't be me. :-) Understood. I wasn't trying to pin the blame on you, just being complete in identifying the problems I saw. For example, it is difficult for Apache to do perform_idle_server_maintenance if there is no server any more, therefore there won't be any idle cleanup after its gone... > > > Eventually I received a "Child XXXX returned a fatal error... Apache is exiting!" > > message, so all of the top level processes are now owned by pid 1. Apache > > continued serving pages for another hour until I killed it. > > This is a problem with the way Apache exits today. Basically, whenever > Apache exits like this, it should first send a signal to the child > processes. Today, it doesn't do that. If the parent decides to die, we > need to be sure the children die too. Ok, I agree here too. Again, just stating problems, not saying you had to fix it. > > Also, the first SIG_WINCH doubled the memory footprint of Apache from 3.5 MB > > (in the first hour it grew from a little under 2 MB to a about 3.5) to almost > > 8 MB. Immediately after the second SIG_WINCH the size grew to 12 MB then > > quickly fell back to 8 MB, then continued to grow at a slow rate until I killed it. > > This is obviously a memory leak someplace. I don't know where right now, > and I am not going looking for it. Again, I didn't allocate any memory in > my patch, I just changed how we interpret the information we have. I thought this was relatively well known. I was just pointing out that, to my knowledge, it has never been recommended to run the threaded mpm with MRPC=0 for this reason. Someday these leaks need to be tracked down, if possible. -- Paul J. Reder ----------------------------------------------------------- "The strength of the Constitution lies entirely in the determination of each citizen to defend it. Only if every single citizen feels duty bound to do his share in this defense are the constitutional rights secure." -- Albert Einstein