Return-Path: Delivered-To: apmail-new-httpd-archive@apache.org Received: (qmail 39163 invoked by uid 500); 2 May 2000 16:36:52 -0000 Mailing-List: contact new-httpd-help@apache.org; run by ezmlm Precedence: bulk X-No-Archive: yes Reply-To: new-httpd@apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list new-httpd@apache.org Received: (qmail 39152 invoked from network); 2 May 2000 16:36:51 -0000 Date: Tue, 2 May 2000 09:36:48 -0700 (PDT) From: dean gaudet To: new-httpd@apache.org Subject: Re: Maintenance of mod_proxy and async i/o In-Reply-To: <009901bfb11b$e2e9a6e0$c1e01b09@raleigh.ibm.com> Message-ID: X-comment: visit http://arctic.org/~dean/legal for information regarding copyright and disclaimer. MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Rating: locus.apache.org 1.6.2 0/1000/N On Fri, 28 Apr 2000, Bill Stoddard wrote: > Could you elaborate? I recall the discussion a while back but I didn't > really 'get it' at the time. > > I am familier with async io and Windows NT iocompletion ports and I > think I have an idea of how to get async network io going in the > Windows MPM. How would Dean's proposal work? the basic realisation goes like this: - it's "easy" to program in a threaded or multiprocess model because you have a stack and can program in a typical "linear" manner where you don't really have to worry about when i/o blocks. - the alternative -- event based programming (select/poll, async i/o, completion ports, callbacks, it has lots of names) requires you to keep all state about where you are in your code in a structure associated with the connection. this means that every time an i/o could block you have to set up a callback to handle when the i/o is ready. this is a complex way to program. - event based programming *tends* to be faster than threaded programming because threaded programming has a stack per connection -- and stacks chew up cache lines, TLB entries, and depending on the OS chew up extra kernel RAM (i.e. linux). - most folks, when given the choice, prefer programming in the threaded model. - it's way way way way way easier to write a dynamic content engine such as perl, or php in the threaded model. - it's way easier to write the HTTP protocol logic that comes before invoke_handler in a threaded model. - it's pretty damn simple to write an event based byte shuffler that can shuffle bytes from another socket/pipe, from disk, or from a big memory buffer (mmap or otherwise). - it would be really rad if we could do some combination of threaded programming for complex handlers (php, perl), threaded programming for the HTTP protocol part up to invoke_handler. but have the OPTION of doing event programming for the final response when the final response is a simple object such as a file, another socket, or a memory object. - fortunately, we can! - all we do is allow a handler to return "hey, serve this socket/disk/memory object using whatever is your fastest method" up to the enclosing MPM. the fastest method might be sendfile, might be select/poll, might be completion ports, ... - and we do a little more work to log so, the MPM model becomes a collection of worker threads, plus an i/o thread. the i/o thread handles accepting new connections, and pumping data from simple objects (socket/pipe, disk, memory). the worker threads take a "task", and run with it in a thread up until they get to a "simple object", and then return back to the MPM. the "task" is either a new request (consider keep-alive connections), or a logging request. this same model works for lots of other protocols as well -- IMAP, for example, has lots and lots of connections which are almost entirely idle. a small set of worker threads can process a command (or series of commands), and then return the connection to the i/o thread when there's no pending commands on it. ditto for SMTP (esp. with sendmail's optimisation of holding up connections in case more mail arrives for that destination). ditto for FTP. -dean