Return-Path: Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: (qmail 45984 invoked from network); 11 Apr 2006 20:51:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 11 Apr 2006 20:51:19 -0000 Received: (qmail 90424 invoked by uid 500); 11 Apr 2006 20:51:14 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 90332 invoked by uid 500); 11 Apr 2006 20:51:13 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 90321 invoked by uid 99); 11 Apr 2006 20:51:13 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Apr 2006 13:51:13 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [206.47.199.166] (HELO simmts8-srv.bellnexxia.net) (206.47.199.166) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Apr 2006 13:51:11 -0700 Received: from [192.168.0.103] ([65.94.50.91]) by simmts8-srv.bellnexxia.net (InterMail vM.5.01.06.13 201-253-122-130-113-20050324) with ESMTP id <20060411205049.TKAL2202.simmts8-srv.bellnexxia.net@[192.168.0.103]>; Tue, 11 Apr 2006 16:50:49 -0400 Message-ID: <443C16B1.1090706@pearsoncmg.com> Date: Tue, 11 Apr 2006 16:50:57 -0400 From: Chris Darroch Organization: Pearson CTG/CMG User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050923 X-Accept-Language: en-ca, en-us MIME-Version: 1.0 To: httpd-dev@apache.org Subject: [PATCH] #39275 MaxClients on startup [Was: Bug in 2.0.56-dev] References: <20060408221021.GA10072@none.at> In-Reply-To: X-Enigmail-Version: 0.93.0.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi -- Alexander Lazic wrote: >> After 'make install' i started apache, then some seconds later i got the >> message '...MaxClients reached...' but there was no entry in the access >> log, and nobody have make a request to this server. Jeff Trawick wrote: > There are problems accounting for child processes which are trying to > initialize that result in the parent thinking it needs to create more > children. The less harmful flavor is when it thinks (incorrectly) it > is already at MaxClients and issues the "reached MaxClients" message. > More disturbing is when MaxClients is very high and the parent keeps > creating new children using exponential ramp-up. That can be very > painful. I have been seeing something similar with 2.2.0 using the worker MPM, where with the following settings, I get over 10 child processes initializing immediately (e.g., up to 15), and then they drop back to 10. I see the "server reached MaxClients" message as well right after httpd startup, although nothing is connecting yet. StartServers 10 MaxClients 150 MinSpareThreads 25 MaxSpareThreads 100 ThreadsPerChild 10 In my case, the problem relates to how long the child_init phase takes to execute. I can "tune" this by raising DBDMin (and DBDKeep) so that mod_dbd attempts to open increasingly large numbers of DB connections during child_init. With DBDMin set to 0 or 1, all is well; no funny behaviour. Up at DBDMin and DBDKeep at 3, that's when (for me) things go pear-shaped. In server/mpm/worker/worker.c, after make_child() creates a child process it immediately sets the scoreboard parent slot's pid value. The main process goes into server_main_loop() and begins executing perform_idle_server_maintenance() every second; this looks at any process with a non-zero pid in the scoreboard and assumes that any of its worker threads marked SERVER_DEAD are, in fact, dead. However, if the child processes are starting "slowly" because ap_run_child_init() in child_main() is taking its time, then start_threads() hasn't even been run yet, so the threads aren't marked SERVER_STARTING -- they're just set to 0 as the default value. But 0 == SERVER_DEAD, so the main process sees a lot of dead worker threads and begins spawning new child processes, up to MaxClients/ThreadsPerChild in the worst case. In this case, when no worker threads have started yet, but all possible child processes have been spawned (and are working through their child_init phases), then the following is true and the "server reached MaxClients" message is printed, even though the server hasn't started accepting connections yet: else if (idle_thread_count < min_spare_threads) { /* terminate the free list */ if (free_length == 0) { I considered wedging another thread status into the scoreboard, between SERVER_DEAD (the initial value) and SERVER_STARTING. The make_child() would set all the thread slots to this value and start_threads() would later flip them to SERVER_STARTING after actually creating the worker threads. That would have various ripple effects on other bits of httpd, though, like mod_status and other MPMs, etc. So instead I tried adding a status field to the process_score scoreboard structure, and making the following changes to worker.c such that this field is set by make_child to SERVER_STARTING and then changed to SERVER_READY once the start thread that runs start_threads() has done its initial work. During this period, while the new child process is running ap_run_child_init() and friends, perform_idle_server_maintenance() just counts that child process's worker threads as all being effectively in SERVER_STARTING mode. Once the process_score.status field changes to SERVER_READY, perform_idle_server_maintenance() begins to look at the individual thread status values. Any thoughts? The patch in Bugzilla doesn't address other MPMs that might see the same behaviour (event, and maybe prefork?) http://issues.apache.org/bugzilla/show_bug.cgi?id=39275 It also doesn't necessarily play ideally well with the fact that new child processes can gradually take over thread slots in the scoreboard from a gracefully exiting old process -- the count of idle threads for that process will be pegged (only by perform_idle_server_maintenance()) at ap_threads_per_child until the new process creates its first new worker thread. But, that may be just fine.... I'll keep poking around and testing and maybe a better idea will present itself. Chris.