commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ot...@ivillage.com
Subject bug: HTTPClient hangs when the server doesn't close the connection
Date Sun, 06 Jan 2002 06:21:17 GMT
Hello,

This looks like a bug...
I noticed that HTTPClient hangs in some cases.  I turned on HTTPClient logging and it looks
like it likes to hang when the server does not close the connection.
Check these logs, they show things pretty clearly.

First, a healthy interaction with a web server.  httpclient.wire logging turns this up when
a normal response is received:

INFO  [Crawler-5] httpclient.wire - >> "GET  HTTP/1.1
": 06 Jan 2002 00:01:42,505
INFO  [Crawler-5] httpclient.wire - >> "Host: www.faz.de
": 06 Jan 2002 00:01:42,507
INFO  [Crawler-5] httpclient.wire - >> "Content-Length: 0
": 06 Jan 2002 00:01:42,507
INFO  [Crawler-5] httpclient.wire - >> "user-agent: None
": 06 Jan 2002 00:01:42,507
INFO  [Crawler-5] httpclient.wire - >> \r\n: 06 Jan 2002 00:01:42,508
INFO  [Crawler-5] httpclient.wire - << "HTTP/1.1 400 Bad Request" [\r\n]: 06 Jan 2002
00:01:42,702
INFO  [Crawler-5] httpclient.wire - << "Server: Microsoft-IIS/5.0" [\r\n]: 06 Jan 2002
00:01:42,703
INFO  [Crawler-5] httpclient.wire - << "Date: Sun, 06 Jan 2002 05:01:33 GMT" [\r\n]:
06 Jan 2002 00:01:42,703
INFO  [Crawler-5] httpclient.wire - << "Set-Cookie: BIGipServerfazde=906039724.20480.0000;
path=/" [\r\n]: 06 Jan 2002 00:01:42,704
INFO  [Crawler-5] httpclient.wire - << "Content-Type: text/html" [\r\n]: 06 Jan 2002
00:01:42,704
INFO  [Crawler-5] httpclient.wire - << "Content-Length: 87" [\r\n]: 06 Jan 2002 00:01:42,705
INFO  [Crawler-5] httpclient.wire - << "<html><head><title>Error</title></head><body>The
parameter is incorrect. </body></html>": 06 Jan 2002 00:01:42,713


The above includes a nice GET request, response headers, and the body.
It happens to be a 400 Bad Request, but that doesn't matter here.


On the other hand, when HTTPClient hung its interaction with the web server looked something
like this:

INFO  [Crawler-5] httpclient.wire - >> "GET  HTTP/1.1
": 06 Jan 2002 00:01:42,950
INFO  [Crawler-5] httpclient.wire - >> "Host: www.focus.de
": 06 Jan 2002 00:01:42,951
INFO  [Crawler-5] httpclient.wire - >> "Content-Length: 0
": 06 Jan 2002 00:01:42,951
INFO  [Crawler-5] httpclient.wire - >> "user-agent: Librisia {OG}/0.1
": 06 Jan 2002 00:01:42,951
INFO  [Crawler-5] httpclient.wire - >> \r\n: 06 Jan 2002 00:01:42,952
INFO  [Crawler-5] httpclient.wire - << "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML
3.0//EN"> <HTML>" [\r\n]: 06 Jan 2002 00:01:43,477
INFO  [Crawler-5] httpclient.wire - << "<HEAD> <TITLE>FOCUS Online | Rubriken
| Inhalt | Fehler</TITLE>" [\r\n]: 06 Jan 2002 00:01:43,478
...
more HTML here
...
INFO  [Crawler-5] httpclient.wire - << "Da wir unsere Seiten mehrmals t&auml;glich"
[\r\n]: 06 Jan 2002 00:01:43,881


Notice a few things:
1. There is no response header, the first thing that HTTPClient gets back is the HTML body.

2. This server encountered some error (see 'Fehler' in the TITLE - Fehler means error auf
Deutsch), maybe that's why it didn't respond properly.

3. The very last data sent from the server is some text, but note that it looks like it was
cut off (there are no </BODY> nor </HTML> tags, even though the response contained
a nice, correct HTML up to that point).
So it looks like the server choked, started hanging or something, and HTTPClient stopped there
as well and remained listening for the rest of the server's response, which never came.

Finally:

netstat -a|grep focus
tcp        0      0 77-108-148-98.nyc:34555 focus.de:http           ESTABLISHED


Obviously, HTTPClient is still holding onto this connection to www.focus.de server!


I noticed this behaviour with other servers, too.  I don't have their httpclient.wire logs,
but see this from netstat -a:

tcp        0      0 77-108-148-98.nyc:34819 www.ananzi.co.za:http   CLOSE_WAIT


This looks like a HTTPClient bug to me.
I don't know enough about the HTTPClient internals yet to fix it and send a diff.
Notice that this also implies the that connection/socket timeout fix from a few weeks ago
doesn't really work, which is what I observed, too.  No matter what timeout I use, if connections
do time out they invariably time out after a bit more than 3 minutes which, I presume, is
some Java's internal, default socket timeout value.

Does anyone know how to fix this 'hang bug'?

Thanks,
Otis
P.S.
If it helps, this is the last log line from HttpConnection, which seems to show that readLine()
method is the method that got stuck.

DEBUG [Crawler-5] org.apache.commons.httpclient.HttpConnection - HttpConnection.readLine():
06 Jan 2002 00:01:43,881

_________________________________________________________________
iVillage.com: Solutions for Your Life 
Check out the most exciting women's community on the Web   
http://www.ivillage.com

--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message