httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From r..@ai.mit.edu (Robert S. Thau)
Subject Re: Linux - Apache problems... Answer.
Date Thu, 31 Aug 1995 10:57:11 GMT
   Date: Thu, 31 Aug 1995 10:24:03 -0400
   X-Sender: awm@qosina.com
   From: "Aram W. Mirzadeh" <awm@qosina.com>

   ** All the quotes are from Alan Cox (iialan@iifeak.swan.ac.uk) 
   ** And they're from a mail to Michael Davon (davon@web-depot.com)
   ** This discovery and answer are a direct result of Michael's search.  And I
   helped.

   After alot of twicking and playing around and talking about 20 
   different people from Linus down to tcp writers, this seems to 
   be a problem with the TCP protocol as it were written into 
   serveral systems.  Although it seems to show up more in 
   Linux because of it's compact networking software.  The problem
   is there in BSD, OSF/1, etc...  

   Here is a quote from Alan Cox who is working with Linus, and 
   the tcp people:

   >Its a common TCP protocol problem - sockets can be tied up for
   >several minutes leading to stuck accept queues. OSF/1, BSD etc all
   >show the problem although Linux is a little more susceptible as
   >it allows longer for a connection to time out. That is patchable.

DAMN --- this should have rung a bell.  It's at this point
conventional wisdom for everyone who runs a busy server that they
should do whatever it takes on their Unix version to increase the
length of the accept "queue" (actually, the maximum number of
partially negotiated connections) --- many systems ship with
ridiculously low defaults, e.g., five.

Where this really screws you is if some router elsewhere on the
Internet goes down, and you were negotiating connections with a lot of
people on the other side of that router.  Then your accept "queue"
carries a lot of partially negotiated connections where negotiation
will never finish, and until it times out, you're hosed.

This problem was extensively discussed on www-talk sometime this
winter, I think, at which point it had been observed on IRIX, Solaris
and SunOS at least.  I'm not *positively* sure this is the problem
(can't be, don't run Linux), but it certainly sounds like a fit to the
symptoms which I observed repeatedly on my own SunOS server before we
recompiled the kernel to raise the limit.

rst

Mime
View raw message