Message-Id: <9506271606.AA50114@hoohoo.ncsa.uiuc.edu>
Subject: Re: Serialising accepts (was Re: apache_0.7.3h comments)
To: new-httpd@hyperreal.com
Date: Tue, 27 Jun 95 11:06:10 CDT
In-Reply-To: <m0sQbkd-0001VkC@mamba.ast.cam.ac.uk>;
 from "David Robinson" at Jun 27, 95 3:36 pm
From: Brandon Long <blong@uiuc.edu>
Organization: NCSA HTTPd Development Team
Sender: owner-new-httpd@apache.org
Precedence: bulk
Reply-To: new-httpd@apache.org

> 
> >From the Sun guy:
> } On Solaris, you need to use it for single processors also and I suspect in
> } all library based Socket interface implementations like Unixware etc.
>       ~~~~~~~~~~~~~~~~~~~~
> } The reason is that accept() is not an atomic operation but just a call in
> } the ABI and any other call to the same fd at the same time can cause
> } unpredictable behaviour.
> 
> In other words, relying on the kernel to serialise accepts may simply not
> work on several unixes, not just multiprocessor OSes. For example, I would
> suspect Linux might fall into this case. (Hence some of the problems reported.)

Alan Cox claimed that this was actually one of his tests for the Linux
networking code (multiple accepts on the same socket), so I find it strange
that it would be that way. 

>I think we should bite the bullet and admit that use of multiple accepts is not
> portable, and that we have to re-work it to explicitly serialise the accepts
> in a portable manner.
> 
> The alternative is to have ad hoc serialisation for each OS that needs it,
> possibly in a manner that is OS minor-version-number specific. For these
> OS's (and how will we know if an OS breaks until the users complain?) the
> ad hoc serialisation will probably be no faster than some portable schemes
> we could implement. For example, lockf() is actually implemented via IPC
> to a lockd daemon; IPC to the parent httpd would not be any slower.
> 
> So why not have a queue managed by the parent of the children waiting
> to accept? Queue management occurs after a child has serviced a request,
> and so does not impact on the server response time. (Unlike the NCSA approach,
> where it is done between accept() returning and the child being passed the
> file-descriptor.)

We tried a similar approach in alpha, and it would blow up.  Of course, 
with the number of other bugs in the code at the time, it might have been
something else.  The queue would grow huge, then drop (as some child came
free, and it found that all of the users had aborted).  This was with sites
like hoohoo and www.acm.uiuc.edu (lots of multi-megabyte transfers) both
of which have really long lived connections.  I don't think it'll be any
worse then your current schema, but remember a lot of Un*ces have 
relatively low limits on number of open file descriptors per process (
if you have ipc between parent and child and a queue, you might hit that
limit pretty quick).

> So:
>  The parent has a pipe to each child.
>  It sends a short message to an idle child saying: 'you accept next'.
>  On completing a request a child sends a message to the parent saying
>  'I am idle', and waits for a response message.
>  The parent has some algorithm for deciding which idle child should be
>  given the next connection. Round-robin would be cache-unfriendly; instead,
>  send it to the most recently used child.

We moved to round robin to keep down bugs (At the time).  Its also faster
to find the next child that way.  Of course, explicitly keeping 2 separate
lists of children (busy and free) would negate most of that.  Also,
damn select (you have to test the whole list of fd's).

Brandon

-- 
Brandon Long			"I think, therefore I am Confused."
NCSA HTTPd Server Team			- Robert Anton Wilson
University of Illinois
blong@uiuc.edu		<a href="http://www.uiuc.edu/ph/www/blong">Push</a>