httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Skolnick <cl...@steam.com>
Subject Re: Proposal: Get rid of most accept mutex)calls on hybrid server.
Date Thu, 13 May 1999 10:24:46 GMT

I asked some folks from sun about this thread, the repsonse is below.

>On Tue, 11 May 1999, Tony Finch wrote:
>
>> Dean Gaudet <dgaudet@arctic.org> wrote:
>> >On Mon, 10 May 1999, Tony Finch wrote:
>> >> Dean Gaudet <dgaudet@arctic.org> wrote:
>> >> >
>> >> >Actually, I suspect that we don't really want to interprocess lock at
all
>> >> >in the multithreaded server.  We use non-blocking listening sockets,
and
>> >> >pay the wake-all cost for the small number of processes (we're talking
>> >> >like 8 processes, right?) 
>> >> 
>> >> If there's a select collision isn't *every* process woken up (not
>> >> just the httpds)?
>> >
>> >I'm not sure what you mean...  if a kernel had only one global sleeping
>> >queue, yeah... but then there'd be no way for us to avoid thundering herd,
>> >since everything would be awakened at all times.  But kernels typically
>> >have a lot of sleeping queues... including one on every socket.
>> 
>> Sorry, I was too terse. The BSD network stack only has space for
>> keeping track of one pid select()ing on each socket. If more than one
>> child select()s on a listen socket at the same time the kernel cannot
>> keep track of them so it doesn't even try: it just marks a collision
>> and when a connection arrives on that socket it wakes up every process
>> in select().
>
>Hmm, that sucks.  Linux keeps a list of all pids on each socket... it
>doesn't cost much, the select() code allocates a page of memory to store
>the wait list elements in.  For stuff other than select, the wait list
>elements are allocated on the kernel stack.  So even though there's a
>dynamic cost to allocating list elements, it's fairly cheap. 
>
>I wonder what solaris does (pretty much the only other platform I care
>about ;) 
>
>Dean

-- forward of a message from a kind person at sun --

There's no limit to the number of LWPs that can select() on a socket, it
should be noted that poll() is prefered as select() in Solaris is implemented
using poll() (i.e. the select() args are converted to pollfd_t's on the
stack (a 1024 element array of) the poll() is called then the results are
converted back to a select() mask).

The implemention of poll() changed in Solaris 7 as several apps (httpd,
database, ...) required the ability to poll() on many thousands of FDs,
prior to Solaris 7 it was a typical linked list of waiters per file_t
(and didn't scale well :(.

As of Solaris 7 a scheme refered to as /dev/poll was implemented such that
pollfd_t's are registered with the underlying FS (i.e. UFS, SOCKFS, ...)
and the FS does asynchronous notification. The end result is that poll()
now scales to tens of thousands of FDs per LWP (as well as a new API for
/dev/poll such that you open /dev/poll and do write()s (to register a number
of pollfd's) and read()s (to wait for, or in the case of nonblocking check
for, pollfd event(s)), using the /dev/poll API memory is your only limit
for scalability.



Mime
View raw message