httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manoj Kasichainula <>
Subject Another async I/O proposal [was Re: request for comments: multiple-connections-per-thread MPM design]
Date Mon, 25 Nov 2002 08:02:16 GMT
I have some suggestions for Brian's design proposal which I'm pondering
and writing up in another message, but meanwhile, I have an alternate
proposal that I've been rolling around inside my head for months now, so
I figured I might as well write it up.

It involves (mostly) a single pool of threads all running through an
event loop. I think the below could be written as a single MPM for a
specific operating system, or a generic MPM optimized for many OSes, or
just APR.

It is also a hybrid sync/async approach, but most aspects of the approach
can be handled by a single thread pool instead of multiple.

Please punch holes in this proposal at will.


Ticket - something to do, e.g. [READ, fd], [LISTEN, fd], [WRITE, fd,
buckets]. It's a request for the main event loop to give us back an

Event - something that has been done (with some of the data used in it)
and its result, e.g. [READ, buckets], [LISTEN, fd], [WRITE], etc.

Both of the above include contexts for state maintenance of course.

Event processor - receives events, processes them, decides on
consequences, and returns a new ticket to handle, or NULL if there is


We have a single pool of threads, growing and shrinking as needed, in a
standard event-handling loop:

while (event = get_next_event())
   add more spare threads if needed
   event_processor = lookup_event_processor(event)
   ticket = event_processor(event)
   if (ticket) submit_ticket(ticket)
   exit loop (and thus end thread) if not needed

The event_processor can take as long as it wants, since there are other
threads who can wait for the next event.

Tickets could be handled in multiple disjoint iterations of the event
loop, but the event processors never see this. This is how Windows can
process a WRITE ticket for a file bucket with TransmitFile w/ completion
ports, Linux can (IIRC) use a non-blocking sendfile loop, and an
old-school unix can use a read-write loop. Note that I did mention
platform-specific code; does APR know how to do async and nonblocking
I/O for various platforms in the optimal way? If not, this loop could.

submit_ticket and get_next_event work together to provide the smarts of
the loop. On old-school unix, submit_ticket would take a ticket and set
up the fd_set, and get_next_event would select() on the fd_set and do
what's appropriate, which doesn't always involve a quick system call and
a return of an event. For example, while handling a WRITE ticket, we
might only be able to partially complete the write without blocking. In
that case, get_next_event could rejigger the fd_set and go back to the
select() call.

HTTP's event_processors, in a simple case where all handlers read HTTP
request data, process it, then return looks sort of like:

http_listen_processor = http_request_processor
    input_buckets += get_buckets(event)
    if (need_more_for_this_request)
        return new_read_ticket(fd, http_request_processor, context)
        /* Next line can take a long time and can be written in a
         * blocking fashion */
        output_buckets = request_handler(fd, input_buckets)
        return new_write_ticket(fd, output_buckets,
                                http_keepalive_processor, context)

    if (keepalive)
        return NULL
        return new_read_ticket(fd, http_request_processor, context)

If we want to allow it, the request_handler() call above could even do
its own reading and writing of the file descriptor.

In the single process case on old-school Unix, submit_ticket can just
tell get_next_event to select+accept w/ a simple mutex around them.  In
the multiple process case, it can wait on a queue for an outside
listener thread like in Brian's description. And in some Unixes (and I
believe Windows with completion ports), the multiprocess case isn't a
concern. Linux 2.6 could use epoll and avoid all these issues, and 2.4
has a realtime signal interface to do the same thing I believe.

I've glossed over where the conn_recs and request_recs get built.
That's mainly because I don't know how the multi-protocol stuff deals
with request_recs :). I would expect conn_recs to be completely generic,
and request_recs to be somewhat or completely http-specific. Generic
portions could go into the main event loop, HTTP portions go into the
http event processors.

Disadvantages of this proposal I can think of offhand:

- Because threads are mostly in one large pool, some common structures
  have to be protected through a mutex. I like paying for mutexes more
  than paying for context switches though.

- We're creating a destroying a lot of "objects" (tickets and events).
  I don't think there'll be much overhead since these aren't real OO
  objects, but we have to be careful


- Async I/O, introduced gradually throughout the server. At first, this
  can just be yet another MPM, with no change to the rest of the server.
  But eventually, it could allow both completely event-driven and
  completely synchronous protocol handlers.  The event-driven protocol
  handlers can then allow event-driven user modules if they choose, or
  run user modules synchronously, or some combination of the 2. A server
  filled only with event-driven protocols and event-driven modules can
  run almost as low as one thread per CPU, with no other tweaking.

- There's no bottleneck where a single thread might block unexpectedly
  and hold up the rest of the process, unless we're forced to put a
  mutex around a suspect system call. I don't think there is in Brian's
  design either, but I haven't thought it through completely :)
- The framework can be reused by different operating systems, each
  optimizing as much or as little as they see fit, or all wrapped in APR
  if we choose. submit_ticket and get_next_event should be the only
  calls that need to be replaced.

- Minimized context switches. If get_next_event is crafted
  appropriately, we could even have thread affinity for connections,
  meaning that if there's only one connection coming in at a time, only
  one thread ever runs

- Transparent support for multiple CPUs

View raw message