Return-Path: Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 51765 invoked by uid 500); 25 Apr 2002 16:57:38 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 51752 invoked from network); 25 Apr 2002 16:57:38 -0000 Date: Thu, 25 Apr 2002 09:57:41 -0700 From: Aaron Bannert To: dev@httpd.apache.org Subject: worker MPM bugs (was Re: [PATCH] Possible fix for worker MPM performance problem) Message-ID: <20020425095741.P8212@clove.org> Mail-Followup-To: Aaron Bannert , dev@httpd.apache.org References: <004c01c1ec6e$338f94d0$50381b09@sashimi> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <004c01c1ec6e$338f94d0$50381b09@sashimi> User-Agent: Mutt/1.3.23i X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Thu, Apr 25, 2002 at 11:30:54AM -0400, Bill Stoddard wrote: > Would someone care to see if this fixes the worker MPM performance problem reported > earlier on the list (request-per-second dropping when clients exceeded threadsperchild)? > This patch defers starting the listener untill -all- the workers have started. I'm not really sure how this would fix the performance problems, and given the current theory it might even exacerbate it. The current hypothesis is that when we run out of available workers in all available children, and we are waiting for a new child to be spawned, connections continue to be accepted and placed in a queue*, and as such aren't able to be immediately serviced as soon as the new child is started. A simple fix would be to prevent the queue* from accepting more connections until there is an idle worker thread available. The reason I have hesitated to make this change is because it would alter the places where the listener thread may enter blocking calls, and would probably break graceful/non-graceful restarts. If I get a little time I will try to look in to this again this weekend. * When I say "queue" I really mean stack. In thinking about this problem over the last few days I realized that we should convert back to a true LIFO, otherwise it is possible for a request to sit at the back of the stack for a long time before it is serviced. Summary of worker bugs that need to be fixed: - convert fd_queue back into a LIFO - add a counter that blocks ap_queue_pop() until there are available workers (without breaking restarts/shutdown). - add a way to track open socket descriptors; when we get the signal to do a hard shutdown of the server, walk down this set and close the fds so we can halt any long-running requests. -aaron