Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Date: Thu, 11 Apr 2002 16:01:27 -0700
From: Aaron Bannert <aaron@clove.org>
To: dev@httpd.apache.org
Subject: Re: [PATCH] convert worker MPM to leader/followers design
Message-ID: <20020411160127.M26866@clove.org>
Mail-Followup-To: Aaron Bannert <aaron@clove.org>, dev@httpd.apache.org
References: <20020411142538.K26866@clove.org>
 <4B8556EA-4D9B-11D6-B08E-000393753936@apache.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4B8556EA-4D9B-11D6-B08E-000393753936@apache.org>
User-Agent: Mutt/1.3.23i

On Thu, Apr 11, 2002 at 03:27:23PM -0700, Roy T. Fielding wrote:
> >Ok, now we're on the same page. I see this as a problem as well, but I
> >don't think this is what is causing the problem described earlier in this
> >thread. Considering how unlikely it is that all of the threads on one
> >process are on long-lived connections, I don't see this as a critical
> >short-term problem. What is more likely is that 'ab', used to observe
> >this phenomon, is flawed in a way that prevents it from truly testing the
> >concurrent processing capabilities of the worker MPM, when it is possible
> >for a request on a different socket to be returned sooner than another.
> >Flood would be much more appropriate for this kind of a test.
> 
> So, what you are saying is that it isn't common for Apache httpd to be used
> for sites that serve large images to people behind modems.  Right?  And
> therefore we shouldn't fix the only MPM that exists solely because sites
> that mostly serve large images to people behind modems didn't want the
> memory overhead of prefork.  Think about it.

I do not believe that we have a scalability problem in the worker MPM.
I believe we have a scalability problem in our testing tool. I agree
that there is a problem that can cause some new connections to appear
to hang under certain unlikely conditions, but I do not believe this can
cause the server to hang as a whole, nor do I believe that this problem
can show up enough to cause a ceiling on concurrent request processing.

Since this is an important issue, and I do not want this to become a
flame fest, I will describe what I think is happening here:

 The worker MPM has N children.
 Each child has M threads.
 Each thread can handle exactly 1 concurrent request.

 In the worse case imagine that M requests were all handled by the same
 child, and that 1 additional request arrives and is to be handled by
 that same child. In this case, that last request must wait for 1 of the M
 busy threads to finish with a previous thread before it can be processed.
 The likelyhood of this happening, however, is a function of the ability
 of the accept mutex to deal with lock contention, and the number of
 concurrent requests. In my opinion, this likelyhood is very small,
 so small that in normal testing I do not believe we will encounter
 this scenario.

 Under typical conditions, long-running and short-running requests will
 be distributed throughout the children. In order for this scenario to
 occur, all M threads in a child would have to be in use by a long-lived
 connection. Assuming a random distribution of these clients, I don't
 see how this scenario can consistently occur except when all threads
 across all children are already being occupied by long-lived connections.


Please do not misunderstand me. I believe Brian has a good design for a
potential replacement of the worker MPM, but I do not believe that it is
right to change the current design. It will be much more appropriate to
create another MPM to implement and test this design, so that the group as
a whole can determine when it has become better than the worker.

-aaron