httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Slemko <ma...@znep.com>
Subject Re: roy's l_c perf patch and spareservers
Date Sat, 15 Feb 1997 07:02:32 GMT
On Fri, 14 Feb 1997, Roy T. Fielding wrote:

> >I put Roy's l_c perf patch on HotWired's servers for a few hours... but
> >the FIN_WAIT_2s flew through the roof, so I had to go back to
> >-DNO_LINGCLOSE.  (IRIX 5.3).  There was some speculation that this might
> >help the FIN_WAIT_2 situation, but it doesn't look like it.  As far as
> >performance goes I didn't really get a chance to compare, I'll try again.
> 
> Thanks, that narrows it a bit.  I think what may be occurring is that
> the shutdown(sd, 1) is changing some flag on IRIX's (and apparently other)
> TCP stacks such that the later close() is not working, or we just block
> on shutdown(sd, 1) because the OS is too stupid to realize that it is
> supposed to be a non-blocking call.  Can you modify that test program
> you posted earlier and truss it to see where it gets lost?  I have run
> out of ideas here, since it works fine on our Solaris machines.  Hmmm,
> I might be able to test it on SunOS4 tomorrow.

Hang on.  Dean is having trouble with the FIN_WAIT_2s, not any of the
other performance issues because the FIN_WAIT_2s hurt his servers too much
to keep lingering_close running long enough to get a good idea.  This is
unrelated to speeding up the code, but is about the problem that has been
happening all along of connections hung in FIN_WAIT_2.  It is not fixed in
b6.  Connections still hang in FIN_WAIT_2.

> 
> Is there any portable way to abort a connection?

Define abort.  On some systems, using SO_LINGER with a linger time of 0
will cause the end closing to send a RST; this is how Navigator does it.
But I can't see where we can abort connections that will do much good,
unless you are thinking of instead of a close() at the end.  

> 
> >One idle thought I had was that we might try playing with the order of
> >things -- do the shutdown(sd,1) after a select() timeout or a successful
> >read(). 
> 
> I don't think that would help.  Maybe just doing a normal close() and
> no prior shutdown, but the most likely culprit is the shutdown call.

We need the shutdown to do the half close or else we will be stuck waiting
for the select timeout for _all_ connections; right now we should be
exiting most of the time as soon as we get the EOF from read() when the
client closes their end.  Nearly all of my test connections were in and
out of lingering_close in under a second.  Example:

[Fri Feb 14 23:45:33 1997] entered lingering_close
[Fri Feb 14 23:45:33 1997] after half-close in lingering_close
[Fri Feb 14 23:45:33 1997] after select in lingering_close, read_rv = 0
[Fri Feb 14 23:45:33 1997] after close in lingering_close

It could behave differently on different platforms.  The server used for
testing still has the 10s select timeout.  On a SunOS server I saw the
same behavior.

That log entry is from a client on taz to a server on my home box
connected via modem; ~290ms RTT.  You should normally see ~RTT extra time
being spent blocked in the select loop in lingering_close if everything is
closed properly.

If the process has to wait for lingering_close to finish before
continuing, then however you do it will be slower than just doing a
close(). 



Mime
View raw message