Return-Path: Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: (qmail 53548 invoked from network); 12 Apr 2006 14:05:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 12 Apr 2006 14:05:18 -0000 Received: (qmail 88432 invoked by uid 500); 12 Apr 2006 14:05:13 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 88360 invoked by uid 500); 12 Apr 2006 14:05:12 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 88349 invoked by uid 99); 12 Apr 2006 14:05:12 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Apr 2006 07:05:12 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of trawick@gmail.com designates 64.233.162.204 as permitted sender) Received: from [64.233.162.204] (HELO zproxy.gmail.com) (64.233.162.204) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Apr 2006 07:05:11 -0700 Received: by zproxy.gmail.com with SMTP id i11so1523066nzh for ; Wed, 12 Apr 2006 07:04:51 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=iLqxbCJRXRtmopkIOxv1AQ5JFTV/VVp6WC/ZykjLJWLSN+ZRxQ9Ubcff+n8mqYuw0jll3cQQVAGCd4cW7TisOlCEgztLidi1zwkBBFJMEsfgY8m3/tZ/FBiwJjzaWZLBmXD8GfFm3ZaeuUCcB5c9zUWGLL2LtuzR64sZYdDI/a0= Received: by 10.36.220.26 with SMTP id s26mr1187674nzg; Wed, 12 Apr 2006 07:04:51 -0700 (PDT) Received: by 10.36.68.4 with HTTP; Wed, 12 Apr 2006 07:04:51 -0700 (PDT) Message-ID: Date: Wed, 12 Apr 2006 10:04:51 -0400 From: "Jeff Trawick" To: dev@httpd.apache.org Subject: Re: [PATCH] #39275 MaxClients on startup [Was: Bug in 2.0.56-dev] In-Reply-To: <443C16B1.1090706@pearsoncmg.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <20060408221021.GA10072@none.at> <443C16B1.1090706@pearsoncmg.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On 4/11/06, Chris Darroch wrote: > Hi -- > > Alexander Lazic wrote: > > >> After 'make install' i started apache, then some seconds later i got t= he > >> message '...MaxClients reached...' but there was no entry in the acces= s > >> log, and nobody have make a request to this server. > > Jeff Trawick wrote: > > > There are problems accounting for child processes which are trying to > > initialize that result in the parent thinking it needs to create more > > children. The less harmful flavor is when it thinks (incorrectly) it > > is already at MaxClients and issues the "reached MaxClients" message. > > More disturbing is when MaxClients is very high and the parent keeps > > creating new children using exponential ramp-up. That can be very > > painful. > > I have been seeing something similar with 2.2.0 using the worker > MPM, where with the following settings, I get over 10 child processes > initializing immediately (e.g., up to 15), and then they drop back to > 10. I see the "server reached MaxClients" message as well right > after httpd startup, although nothing is connecting yet. > > > StartServers 10 > MaxClients 150 > MinSpareThreads 25 > MaxSpareThreads 100 > ThreadsPerChild 10 > > > In my case, the problem relates to how long the child_init phase > takes to execute. I can "tune" this by raising DBDMin (and DBDKeep) > so that mod_dbd attempts to open increasingly large numbers of > DB connections during child_init. With DBDMin set to 0 or 1, > all is well; no funny behaviour. Up at DBDMin and DBDKeep at 3, > that's when (for me) things go pear-shaped. > > In server/mpm/worker/worker.c, after make_child() creates a > child process it immediately sets the scoreboard parent slot's pid > value. The main process goes into server_main_loop() and begins > executing perform_idle_server_maintenance() every second; this > looks at any process with a non-zero pid in the scoreboard and > assumes that any of its worker threads marked SERVER_DEAD are, > in fact, dead. > > However, if the child processes are starting "slowly" because > ap_run_child_init() in child_main() is taking its time, then > start_threads() hasn't even been run yet, so the threads aren't > marked SERVER_STARTING -- they're just set to 0 as the default > value. But 0 =3D=3D SERVER_DEAD, so the main process sees a lot > of dead worker threads and begins spawning new child processes, > up to MaxClients/ThreadsPerChild in the worst case. In this case, > when no worker threads have started yet, but all possible child > processes have been spawned (and are working through their > child_init phases), then the following is true and the > "server reached MaxClients" message is printed, even though > the server hasn't started accepting connections yet: > > else if (idle_thread_count < min_spare_threads) { > /* terminate the free list */ > if (free_length =3D=3D 0) { > > I considered wedging another thread status into the > scoreboard, between SERVER_DEAD (the initial value) and > SERVER_STARTING. The make_child() would set all the thread > slots to this value and start_threads() would later flip them > to SERVER_STARTING after actually creating the worker threads. > > That would have various ripple effects on other bits of > httpd, though, like mod_status and other MPMs, etc. In other words, breaks binary compatibility... Other modules should see the threads in SERVER_STARTING state anyway.=20 IOW, I think we should set state to SERVER_STARTING before we do any potentially-lengthy work like running child-init hooks so that the state as seen from the outside makes sense. That also means resetting the state if something fails (e.g., pthread_create()). But that isn't needed for proper operation of the MPM, which is what we're after at the moment... But it would be great to be able to see from mod_status that a child is taking way too long in the SERVER_STARTING state. > So instead > I tried adding a status field to the process_score scoreboard > structure, and making the following changes to worker.c such that > this field is set by make_child to SERVER_STARTING and then > changed to SERVER_READY once the start thread that runs > start_threads() has done its initial work. I was considering adding something to process_score for this issue but I decided against it, hopefully for an bogus reason -- binary compatibility breakage. This isn't binary compatibility breakage since we provide ap_get_scoreboard_process() for modules to retrieve a process_score structure, and if fields get added to the end for the use of the MPM then no worries since we don't support modules creating their own process_score structures and stuffing them in the scoreboard. (confirmation from the crowd?) Instead of "unsigned char status" I'd prefer something like apr_int32_t mpm_state; /* internal state for MPM; meaning may change * in the future, so not for use by other modules */ If a particular MPM wants to store SERVER_STARTING/SERVER_DEAD/etc. then fi= ne. > During this period, while the new child process is running > ap_run_child_init() and friends, perform_idle_server_maintenance() > just counts that child process's worker threads as all being > effectively in SERVER_STARTING mode. Once the process_score.status > field changes to SERVER_READY, perform_idle_server_maintenance() > begins to look at the individual thread status values. > > Any thoughts? The patch in Bugzilla doesn't address other > MPMs that might see the same behaviour (event, and maybe prefork?) > > http://issues.apache.org/bugzilla/show_bug.cgi?id=3D39275 > > It also doesn't necessarily play ideally well with the fact that > new child processes can gradually take over thread slots in > the scoreboard from a gracefully exiting old process -- the > count of idle threads for that process will be pegged (only > by perform_idle_server_maintenance()) at ap_threads_per_child > until the new process creates its first new worker thread. > But, that may be just fine.... I'll keep poking around and > testing and maybe a better idea will present itself. A gracefully exiting process has lost its process score field and gradually loses its worker_score fields as well. Gracefully exiting threads aren't counted as active or idle. I think this means we can create a new process to make up for gracefully exiting threads that we won't necessarily need once they finish and new threads in that process scoreboard slot take over.=20 Unavoidable, since gracefully exiting threads can take forever.