Return-Path: Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 53898 invoked by uid 500); 11 Apr 2002 23:01:27 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 53877 invoked from network); 11 Apr 2002 23:01:27 -0000 Date: Thu, 11 Apr 2002 16:01:27 -0700 From: Aaron Bannert To: dev@httpd.apache.org Subject: Re: [PATCH] convert worker MPM to leader/followers design Message-ID: <20020411160127.M26866@clove.org> Mail-Followup-To: Aaron Bannert , dev@httpd.apache.org References: <20020411142538.K26866@clove.org> <4B8556EA-4D9B-11D6-B08E-000393753936@apache.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B8556EA-4D9B-11D6-B08E-000393753936@apache.org> User-Agent: Mutt/1.3.23i X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Thu, Apr 11, 2002 at 03:27:23PM -0700, Roy T. Fielding wrote: > >Ok, now we're on the same page. I see this as a problem as well, but I > >don't think this is what is causing the problem described earlier in this > >thread. Considering how unlikely it is that all of the threads on one > >process are on long-lived connections, I don't see this as a critical > >short-term problem. What is more likely is that 'ab', used to observe > >this phenomon, is flawed in a way that prevents it from truly testing the > >concurrent processing capabilities of the worker MPM, when it is possible > >for a request on a different socket to be returned sooner than another. > >Flood would be much more appropriate for this kind of a test. > > So, what you are saying is that it isn't common for Apache httpd to be used > for sites that serve large images to people behind modems. Right? And > therefore we shouldn't fix the only MPM that exists solely because sites > that mostly serve large images to people behind modems didn't want the > memory overhead of prefork. Think about it. I do not believe that we have a scalability problem in the worker MPM. I believe we have a scalability problem in our testing tool. I agree that there is a problem that can cause some new connections to appear to hang under certain unlikely conditions, but I do not believe this can cause the server to hang as a whole, nor do I believe that this problem can show up enough to cause a ceiling on concurrent request processing. Since this is an important issue, and I do not want this to become a flame fest, I will describe what I think is happening here: The worker MPM has N children. Each child has M threads. Each thread can handle exactly 1 concurrent request. In the worse case imagine that M requests were all handled by the same child, and that 1 additional request arrives and is to be handled by that same child. In this case, that last request must wait for 1 of the M busy threads to finish with a previous thread before it can be processed. The likelyhood of this happening, however, is a function of the ability of the accept mutex to deal with lock contention, and the number of concurrent requests. In my opinion, this likelyhood is very small, so small that in normal testing I do not believe we will encounter this scenario. Under typical conditions, long-running and short-running requests will be distributed throughout the children. In order for this scenario to occur, all M threads in a child would have to be in use by a long-lived connection. Assuming a random distribution of these clients, I don't see how this scenario can consistently occur except when all threads across all children are already being occupied by long-lived connections. Please do not misunderstand me. I believe Brian has a good design for a potential replacement of the worker MPM, but I do not believe that it is right to change the current design. It will be much more appropriate to create another MPM to implement and test this design, so that the group as a whole can determine when it has become better than the worker. -aaron