httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Slemko <ma...@znep.com>
Subject more lingering_close...
Date Sun, 09 Feb 1997 04:34:31 GMT
Below is a bit of discussion on the issues surrounding
lingering_close.  It starts off going through an example of a 
PUT, then moves on to persistent connections.

(alive is the client, obed is the server.  alive running FreeBSD 2.1.x, 
obed running SunOS 4.1.x)

Ok, I'll start with a recap of the situation with PUTs.  Below
is an annoated tcpdump from doing a PUT to an Apache server 
w/o lingering_close from a client running Navigator Gold.  The
server will return an error message because that is how it is setup.

19:57:29.171129 alive.worldgate.com.4026 > obed.cs.ualberta.ca.6060: . 234:746(512) ack
1 win 16384 (DF)
19:57:29.171291 alive.worldgate.com.4026 > obed.cs.ualberta.ca.6060: . 746:1258(512) ack
1 win 16384 (DF)

The first packet is the headers from the client to the server.  The second
is the start of the data being PUT.

19:57:29.600864 obed.cs.ualberta.ca.6060 > alive.worldgate.com.4026: P 1:338(337) ack 746
win 4096

This is the error message from the server to the client.  It is 
sent by the server right after it gets our first packet.

19:57:29.601314 alive.worldgate.com.4026 > obed.cs.ualberta.ca.6060: . 1258:1770(512) ack
338 win 16047 (DF)
19:57:29.601569 alive.worldgate.com.4026 > obed.cs.ualberta.ca.6060: . 1770:2282(512) ack
338 win 16047 (DF)

This is the document continuing to be sent.

19:57:29.610947 obed.cs.ualberta.ca.6060 > alive.worldgate.com.4026: F 338:338(0) ack 746
win 4096

This is the server sending the FIN to close the connection; it sends it
right after it sends the one at 19:57:29.600864, it just shows up
here later because of latency (~250ms between client and server).

19:57:29.611247 alive.worldgate.com.4026 > obed.cs.ualberta.ca.6060: . ack 339 win 16307
(DF)

We are still sending data.

19:57:29.611549 obed.cs.ualberta.ca.6060 > alive.worldgate.com.4026: R 865216001:865216001(0)
win 0
19:57:29.880585 obed.cs.ualberta.ca.6060 > alive.worldgate.com.4026: R 865216338:865216338(0)
win 4096
19:57:29.990543 obed.cs.ualberta.ca.6060 > alive.worldgate.com.4026: R 865216338:865216338(0)
win 4096
19:57:29.990921 obed.cs.ualberta.ca.6060 > alive.worldgate.com.4026: R 865216339:865216339(0)
win 4096

The server doesn't like keeping to get data after it said the connection
was closed, so it sends a RST.  There is sortof one for each packet
the client sent after the headers, but not exactly; would need to make
tcpdump print absolute seq. numbers to figure out exactly which one is
in response to which message.  Navigator generates a 'A network error 
occurred while Netscape was receiving data.  (Network Error: Connection
reset by peer)'.

In this situation, if lingering_close() was used the client would not
get the RSTs and so would not pop up the TCP error; it should, of 
course, pop up a box with the error that the HTTP server sent.

One might expect that the server would not generate RSTs to packets until
the client had closed its half of the connection; at the time the
server sends the RSTs, the connection is only half-closed.  However,
that isn't the way it happens.

*** THIS IS IMPORTANT: *** When the client gets the RST, the RST
includes the sequence number of the last packet from the server
ACKed by the client at the time the RST was sent.  The client WILL
normally flush any buffered incoming data received from the server
after that sequence number.  This means that if the client is to
reliably get the entire error message to display, the server MUST
NOT send a RST until it has received an ACK of the last packet in
the error message it sends to the client.  Nothing the client
can do without modifying the TCP stack can change this, no matter what
it does with errors.

I think this illustrates the issue at the TCP layer and how you have
problems when the server closes the connection but the client keeps
sending.  It is only an example, however, since the above can
be solved by simply not sending a response until we get all the
data.  A bit wasteful and lame, but a possible workaround.  I think 
some of Netscape's newer servers do this; older ones appear to act
just like Apache 1.1.x.  

Now, make sure you understand everything above or at least have read
it over enough to have some idea what I am saying.  Now we consider
the persistent connection case.  

First, a review of some of the features of persistent connections:
	- more than one request per connection
	- the server can close the connection at any time
	- if the client does not get a response to a query before the 
	  connection closes, it should retry in a new connection.
	- the client can pipeline requests; ie. send a new
	  request before the results of the previous ones are
	  received.

There are many possible cases where the server can close the connection
while the client is sending a new request.  Whenever I present an
example, people say "but that exact example isn't likely to happen
that often".  That is true, however there are numerous different
situations where it could happen.  This example is just one case;
the basic issues remain the same across all such cases.

Consider the client making the two following requests as pipelined requests:

	GET /not_here.html HTTP/1.1
	Host: foobar

	GET /file.html HTTP/1.1
	Host: foobar

The first file does not exist on the server, the second does.  Assume
that the server closes the connection if it sends a 404, just like 
Apache does.  Also assume that the server sends the response to the first
query and the FIN before the second query gets there.  When the 
second query gets there, it will send a RST to the client.  If the
error message from the first request is still in the client's
buffers, it will get flushed by the RST.  

At this point, one of two things can happen depending on the client:

	- the client can pop up a TCP error message (NOT a file not found, 
	  because it never got the 404 message because of the RST).
	- the client can silently retry the attempt

In the first case, things are already broken.  I would not be suprised
if at least some clients exhibited the first behavior.  If the
client retrys, it can have the same problem again.  For the client
to ensure it does not get the same error again, it needs to stop
pipelining on the retry.  We see there is a workaround that the 
client _can_ implement, but it is wasteful (both because it requires
the connection be retried and because the second time pipelining
isn't used) and may not be implemented by clients.  

This brings us to the most important point.  I think it all comes down
to the fact that Apache may or may not need lingering_close() to
properly support HTTP/1.1 clients depending entirely on the client
implementation.  We don't know how they will be implemented.  I am not
sure that the HTTP/1.1 spec is clear on this issue because it doesn't
deal with many of the TCP layer aspects.

This means that in 1.2 we can either:
	- disable LINGERING_CLOSE, and plan a maint. relesae with it 
	  enabled if it turns out that clients, when implemented, need
	  it.
	- enable it and be safe no matter what the client does.



Mime
View raw message