httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <>
Subject Re: Proposal: Get rid of most accept mutex)calls on hybrid server.
Date Fri, 14 May 1999 15:05:20 GMT
On Thu, 13 May 1999, Cliff Skolnick wrote:

> There's no limit to the number of LWPs that can select() on a socket, it
> should be noted that poll() is prefered as select() in Solaris is implemented
> using poll() (i.e. the select() args are converted to pollfd_t's on the
> stack (a 1024 element array of) the poll() is called then the results are
> converted back to a select() mask).

Yeah whenever we say "select" we really mean "select or poll as
appropriate" (at least that's what I mean -- because linux 2.2.x also
implements poll as the core primitive and converts select to poll... and
because poll really is the only logical alternative for thousands of fds). 

> The implemention of poll() changed in Solaris 7 as several apps (httpd,
> database, ...) required the ability to poll() on many thousands of FDs,
> prior to Solaris 7 it was a typical linked list of waiters per file_t
> (and didn't scale well :(.

Interesting... if there are n waiters waiting on f_1, f_2, ... f_n fds
respectively, it requires O(f_1 + f_2 + ... + f_n) time to tell the kernel
about the fds you're interested in... I'm not seeing a way to shorten that
which would make it worthwhile to change the linked list. 

> As of Solaris 7 a scheme refered to as /dev/poll was implemented such that
> pollfd_t's are registered with the underlying FS (i.e. UFS, SOCKFS, ...)
> and the FS does asynchronous notification. The end result is that poll()
> now scales to tens of thousands of FDs per LWP (as well as a new API for
> /dev/poll such that you open /dev/poll and do write()s (to register a number
> of pollfd's) and read()s (to wait for, or in the case of nonblocking check
> for, pollfd event(s)), using the /dev/poll API memory is your only limit
> for scalability.

Now that's real nice.  I've been advocating this on linux kernel for a
long time.  Say hello to completion ports the unix way.  I'm assuming they
do the "right thing" and wake up in LIFO order, and allow you to read
multiple events at once.

In case it wasn't obvious -- I planned the event thread to be customized
to each platform... it's where we get to take advantage of
platform-specific extensions (and quirks).

So far it sounds like only *BSD has problems with lots of processes
blocked in select() on the same socket.  We could use flock() locking
(with LOCK_NB) to arbitrate which event thread is handling the listening
sockets; it won't be any worse than what we have now, and will be lots
better in other ways. 

I really do think the architecture has all the right trade-offs: 

- We get to use threads for the protocol handling code, which is real nice
  because we don't have to build complex state engines to handle sockets
  blocking.  The common case is that all the protocol stuff is in the
  first packet.

- Module writers get to use threads, which gives them the same
  straightforward programming model.

- The people who whine about "why does apache not use select?" are
  appeased because on static-only servers we do use select for pretty
  much everything.

- We're set to handle hundreds upon hundreds of long haul slow clients
  downloading large responses.


View raw message