httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Slemko <>
Subject Re: Finish 1.2!
Date Fri, 10 Jan 1997 08:08:03 GMT
On Thu, 9 Jan 1997, Cliff Skolnick wrote:
> Well this is actually a well known TCP/IP bug.  If I remember correctly 
> it had something to do with the client "disappearing" from the net (like 
> hanging up their ISP) and not letting the socket fully close.  I thought 
> a while about why this could be worse with 1.2, and the only two things I 
> come up with are:

The problem of connections getting hung in FIN_WAIT_2 is a well known
"bug" in the TCP spec in that it lets a client's behaviour adversely
affect the server.  But I don't see the normal path to hanging in
FIN_WAIT_2 being that common.

What happens is:
	- the server sends a FIN to the client, which says that the
	  server will no longer be sending any more data.  When it
	  gets an ACK to this FIN, it goes into the FIN_WAIT_2 state.
	  Just looking at the packet exchange, a connection at this
	  stage could still be used to transfer data from the client
	  to the server because we have only done a half close.
	- the server is now in FIN_WAIT_2; it waits forever (or until
	  a timeout) until it gets a FIN from the client; when it
	  does, it sends an ACK and goes to TIME_WAIT.

In the normal situation, unless it is setup to do a half-close, the
client should then send a FIN right away, which the server ACKs and
goes into TIME_WAIT.  

On the surface of things, you should only get into this state if the
client disconnects between sending the ACK to your FIN, and sending a
FIN back to you.  In a normal situation, those two should happen
almost at the same time.  Hmm.  

I'll have to check the kernel, but by doing a sortof half-close 
(shutdown(2) with 1 as a second param) apache may be putting the
kernel in a state where things don't timeout as they would if it
shutdown both directions at the same time.  That said, lingering_close
may have nothing to do with it; there is certainly some evidence
pointing towards disabling it not fixing anything.

Hmm.  I think that lingering_close will behave differently on Linux
than other platforms because Linux modifies the timer passed to
select, no?

> 	1) They were always there, and people are just noticing them.  The
> 	people who downgraded and still saw them said nothing, but the people
> 	who downgraded and did not see them spoke up really loudly.  The only
> 	real cause was the random there/not there factor.  I know I have
> 	always seen these until I applied a patch to get rid of them
> 	after a while, this was a kernel thing since these sockets are not
> 	tied to a user process.

I certainly would be able to believe that, but there seem to be too
many people who have a server that simply will not run under 1.2
because it runs out of mbufs, but under 1.1.1 it is definitely fine.

> 	2) Sockets are gettting stuck in 1.2, increasing the chance that
> 	this may happen.  If the server does not close the connection and
> 	lets it timeout I can guess there may be a greater chance of a
> 	FIN_WAIT2 when the server starts timing stuff out and you have a
> 	bunch of dialup users logging of the net.

> Any other thoughts?  Maybe we should ask the people to send the error_log and
> see if there are more timeouts reported for 1.2 that 1.1?  Way to test 
> for #2.

I'm not sure that would help that much.  I think that most of the
things that you would expect to cause this would be in http_main.c, so
perhaps getting them to try a http_main.c that was hacked to be as
much like 1.1.1 as possible would help.  We also can't forget that we
may be seeing several problems here.  With 1.1.1 on FreeBSD, I
normally see... 60 or so connections in FIN_WAIT_2 on a server doing
perhaps 10 connections/sec on average, but it doesn't cause a problem
because they timeout.

View raw message