httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Slemko <ma...@valis.worldgate.com>
Subject Re: more FIN_WAIT_2 analysis
Date Tue, 21 Jan 1997 20:05:57 GMT
This appears to be one problem; I have replicated it here on essentially
the same setup that Roy used.  Anyone try with a Windows client?  May give
different behavior.

I agree with Roy that there is very little we can do about this.  It
appears that it is caused by Netscape simply not reading anything from the
connection until it wants to send something; when it does, it reads the
EOF and closes it.  If netscape won't close the connection until it tries
to use it, and won't read anything from it until it tries to use it a
workaround may be impossible given that we have to work through the
sockets API.  Bug-report-to-netscape time?  Do they have no servers that
do keepalives yet?

However, this is not the whole problem.  There are definitely people that
do NOT have the problem with 1.1.x but do with 1.2, and NO_LINGCLOSE fixes
it for nearly all of them.  My thinking is that this is either a bug in
the lingering_close code or simply a result of adding a call to
shutdown(); on some platforms, shutdown() may try too hard to close the
connection properly.  If it is the latter case, a workaround could be very
very hard.

The other thought I had is that perhaps the longer time spent in
lingering_close before it does the second shutdown() and the close()
results in dialup clients disconnecting and somehow causing problems.  I
haven't found an explaination at the protocol level to back that up yet,
but I am thinking about if it may be possible.

Now that Roy has detailed the things I was starting to figure out
yesterday, I agree that it appears some sort of lingering close is the
only way out, but I'm afraid making it work well on all systems may be
impossible. 

On Tue, 21 Jan 1997, Roy T. Fielding wrote:

> I have been able to reproduce the FIN_WAIT_2 condition consistently
> using "Mozilla/3.01Gold (X11; I; SunOS 5.5 sun4m)" with the 1.2b5-dev
> server on the same machine (Solaris 2.5, sparc4).  It is caused by
> Netscape not closing the connection when a close is received from
> the server after a keep-alive timeout.  It is equally present when
> the same client has a keep-alive connection to an Apache 1.1.3 server.
> 
> In my case, the FIN_WAIT_2 ends when the first of the following occur:
>    1) the client attempts to make another request on the same
>       connection and finds that it is closed, or closes the connection
>       so that it can make a request to some other server.
>    2) the client is exited
>    3) the Solaris kernel times-out FIN_WAIT_2
> 
> At the same time, the client lists its sockets in CLOSE_WAIT state
> (and remains in that state until (1) or (2) above.  In other words,
> Netscape is just being frickin lazy and wasteful.
> 
> My guess is that there exists a Windows version that doesn't close
> the client sockets on (2) and, when combined with an OS that doesn't
> have a FIN_WAIT_2 timeout, results in FIN_WAIT_2-forever conditions.
> Note, however, that a client which is just sitting idle will hold
> onto its last connections as well, so attempting to run without a FIN_WAIT_2
> timeout will pretty much require disabling keep-alive for Mozilla/*.
> 
> I'll try fooling around with parts of the lingering_close routine to
> see if it has any effect, but given that I see the same occurring on
> Apache 1.1.3, I doubt that we can do anything about it.
> 
> .....Roy
> 


Mime
View raw message