Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Date: Thu, 25 Apr 2002 09:57:41 -0700
From: Aaron Bannert <aaron@clove.org>
To: dev@httpd.apache.org
Subject: worker MPM bugs (was Re: [PATCH] Possible fix for worker MPM
 performance problem)
Message-ID: <20020425095741.P8212@clove.org>
Mail-Followup-To: Aaron Bannert <aaron@clove.org>, dev@httpd.apache.org
References: <004c01c1ec6e$338f94d0$50381b09@sashimi>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <004c01c1ec6e$338f94d0$50381b09@sashimi>
User-Agent: Mutt/1.3.23i

On Thu, Apr 25, 2002 at 11:30:54AM -0400, Bill Stoddard wrote:
> Would someone care to see if this fixes the worker MPM performance problem reported
> earlier on the list (request-per-second dropping when clients exceeded threadsperchild)?
> This patch defers starting the listener untill -all- the workers have started.

I'm not really sure how this would fix the performance problems, and given
the current theory it might even exacerbate it. The current hypothesis
is that when we run out of available workers in all available children,
and we are waiting for a new child to be spawned, connections continue
to be accepted and placed in a queue*, and as such aren't able to be
immediately serviced as soon as the new child is started.

A simple fix would be to prevent the queue* from accepting more
connections until there is an idle worker thread available. The reason
I have hesitated to make this change is because it would alter the
places where the listener thread may enter blocking calls, and would
probably break graceful/non-graceful restarts. If I get a little
time I will try to look in to this again this weekend.

* When I say "queue" I really mean stack. In thinking about this problem
over the last few days I realized that we should convert back to a true
LIFO, otherwise it is possible for a request to sit at the back of the
stack for a long time before it is serviced.


Summary of worker bugs that need to be fixed:

- convert fd_queue back into a LIFO
- add a counter that blocks ap_queue_pop() until there are available workers
  (without breaking restarts/shutdown).
- add a way to track open socket descriptors; when we get the signal to
  do a hard shutdown of the server, walk down this set and close the fds
  so we can halt any long-running requests.

-aaron