Mailing-List: contact new-httpd-help@apache.org; run by ezmlm
Precedence: bulk
Reply-To: new-httpd@apache.org
Date: Tue, 2 May 2000 09:36:48 -0700 (PDT)
From: dean gaudet <dgaudet-list-new-httpd@arctic.org>
To: new-httpd@apache.org
Subject: Re: Maintenance of mod_proxy and async i/o
In-Reply-To: <009901bfb11b$e2e9a6e0$c1e01b09@raleigh.ibm.com>
Message-ID: <Pine.LNX.4.21.0005020922430.22518-100000@twinlark.arctic.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


On Fri, 28 Apr 2000, Bill Stoddard wrote:

> Could you elaborate? I recall the discussion a while back but I didn't
> really 'get it' at the time.
> 
> I am familier with async io and Windows NT iocompletion ports and I
> think I have an idea of how to get async network io going in the
> Windows MPM. How would Dean's proposal work?

the basic realisation goes like this:

- it's "easy" to program in a threaded or multiprocess model because you
  have a stack and can program in a typical "linear" manner where you
  don't really have to worry about when i/o blocks.

- the alternative -- event based programming (select/poll, async i/o,
  completion ports, callbacks, it has lots of names) requires you to keep
  all state about where you are in your code in a structure associated
  with the connection.  this means that every time an i/o could block
  you have to set up a callback to handle when the i/o is ready.  this
  is a complex way to program.

- event based programming *tends* to be faster than threaded programming
  because threaded programming has a stack per connection -- and stacks
  chew up cache lines, TLB entries, and depending on the OS chew up
  extra kernel RAM (i.e. linux).

- most folks, when given the choice, prefer programming in the threaded
  model.

- it's way way way way way easier to write a dynamic content engine
  such as perl, or php in the threaded model.

- it's way easier to write the HTTP protocol logic that comes before
  invoke_handler in a threaded model.

- it's pretty damn simple to write an event based byte shuffler that
  can shuffle bytes from another socket/pipe, from disk, or from
  a big memory buffer (mmap or otherwise).

- it would be really rad if we could do some combination of threaded
  programming for complex handlers (php, perl), threaded programming for
  the HTTP protocol part up to invoke_handler.  but have the OPTION of
  doing event programming for the final response when the final response
  is a simple object such as a file, another socket, or a memory object.

- fortunately, we can!

- all we do is allow a handler to return "hey, serve this
  socket/disk/memory object using whatever is your fastest method"
  up to the enclosing MPM.  the fastest method might be sendfile,
  might be select/poll, might be completion ports, ...

- and we do a little more work to log

so, the MPM model becomes a collection of worker threads, plus an
i/o thread.

the i/o thread handles accepting new connections, and pumping
data from simple objects (socket/pipe, disk, memory).  the worker
threads take a "task", and run with it in a thread up until they
get to a "simple object", and then return back to the MPM.

the "task" is either a new request (consider keep-alive connections),
or a logging request.

this same model works for lots of other protocols as well -- IMAP,
for example, has lots and lots of connections which are almost
entirely idle.  a small set of worker threads can process a
command (or series of commands), and then return the connection
to the i/o thread when there's no pending commands on it.

ditto for SMTP (esp. with sendmail's optimisation of holding up
connections in case more mail arrives for that destination).

ditto for FTP.

-dean