Return-Path: Delivered-To: apmail-jakarta-commons-dev-archive@apache.org Received: (qmail 51084 invoked from network); 6 Jan 2002 06:21:11 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 6 Jan 2002 06:21:11 -0000 Received: (qmail 14764 invoked by uid 97); 6 Jan 2002 06:21:20 -0000 Delivered-To: qmlist-jakarta-archive-commons-dev@jakarta.apache.org Received: (qmail 14717 invoked by uid 97); 6 Jan 2002 06:21:18 -0000 Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Jakarta Commons Developers List" Reply-To: "Jakarta Commons Developers List" Delivered-To: mailing list commons-dev@jakarta.apache.org Received: (qmail 14706 invoked from network); 6 Jan 2002 06:21:17 -0000 Date: 5 Jan 2002 22:21:17 -0800 Message-ID: <20020106062117.9074.cpmta@c006.snv.cp.net> X-Sent: 6 Jan 2002 06:21:17 GMT Content-Type: text/plain Content-Disposition: inline Mime-Version: 1.0 To: commons-dev@jakarta.apache.org From: otisg@ivillage.com X-Mailer: Web Mail 3.9.3.5 X-Sent-From: otisg@ivillage.com Subject: bug: HTTPClient hangs when the server doesn't close the connection X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hello, This looks like a bug... I noticed that HTTPClient hangs in some cases. I turned on HTTPClient logging and it looks like it likes to hang when the server does not close the connection. Check these logs, they show things pretty clearly. First, a healthy interaction with a web server. httpclient.wire logging turns this up when a normal response is received: INFO [Crawler-5] httpclient.wire - >> "GET HTTP/1.1 ": 06 Jan 2002 00:01:42,505 INFO [Crawler-5] httpclient.wire - >> "Host: www.faz.de ": 06 Jan 2002 00:01:42,507 INFO [Crawler-5] httpclient.wire - >> "Content-Length: 0 ": 06 Jan 2002 00:01:42,507 INFO [Crawler-5] httpclient.wire - >> "user-agent: None ": 06 Jan 2002 00:01:42,507 INFO [Crawler-5] httpclient.wire - >> \r\n: 06 Jan 2002 00:01:42,508 INFO [Crawler-5] httpclient.wire - << "HTTP/1.1 400 Bad Request" [\r\n]: 06 Jan 2002 00:01:42,702 INFO [Crawler-5] httpclient.wire - << "Server: Microsoft-IIS/5.0" [\r\n]: 06 Jan 2002 00:01:42,703 INFO [Crawler-5] httpclient.wire - << "Date: Sun, 06 Jan 2002 05:01:33 GMT" [\r\n]: 06 Jan 2002 00:01:42,703 INFO [Crawler-5] httpclient.wire - << "Set-Cookie: BIGipServerfazde=906039724.20480.0000; path=/" [\r\n]: 06 Jan 2002 00:01:42,704 INFO [Crawler-5] httpclient.wire - << "Content-Type: text/html" [\r\n]: 06 Jan 2002 00:01:42,704 INFO [Crawler-5] httpclient.wire - << "Content-Length: 87" [\r\n]: 06 Jan 2002 00:01:42,705 INFO [Crawler-5] httpclient.wire - << "ErrorThe parameter is incorrect. ": 06 Jan 2002 00:01:42,713 The above includes a nice GET request, response headers, and the body. It happens to be a 400 Bad Request, but that doesn't matter here. On the other hand, when HTTPClient hung its interaction with the web server looked something like this: INFO [Crawler-5] httpclient.wire - >> "GET HTTP/1.1 ": 06 Jan 2002 00:01:42,950 INFO [Crawler-5] httpclient.wire - >> "Host: www.focus.de ": 06 Jan 2002 00:01:42,951 INFO [Crawler-5] httpclient.wire - >> "Content-Length: 0 ": 06 Jan 2002 00:01:42,951 INFO [Crawler-5] httpclient.wire - >> "user-agent: Librisia {OG}/0.1 ": 06 Jan 2002 00:01:42,951 INFO [Crawler-5] httpclient.wire - >> \r\n: 06 Jan 2002 00:01:42,952 INFO [Crawler-5] httpclient.wire - << " " [\r\n]: 06 Jan 2002 00:01:43,477 INFO [Crawler-5] httpclient.wire - << " FOCUS Online | Rubriken | Inhalt | Fehler" [\r\n]: 06 Jan 2002 00:01:43,478 ... more HTML here ... INFO [Crawler-5] httpclient.wire - << "Da wir unsere Seiten mehrmals täglich" [\r\n]: 06 Jan 2002 00:01:43,881 Notice a few things: 1. There is no response header, the first thing that HTTPClient gets back is the HTML body. 2. This server encountered some error (see 'Fehler' in the TITLE - Fehler means error auf Deutsch), maybe that's why it didn't respond properly. 3. The very last data sent from the server is some text, but note that it looks like it was cut off (there are no nor tags, even though the response contained a nice, correct HTML up to that point). So it looks like the server choked, started hanging or something, and HTTPClient stopped there as well and remained listening for the rest of the server's response, which never came. Finally: netstat -a|grep focus tcp 0 0 77-108-148-98.nyc:34555 focus.de:http ESTABLISHED Obviously, HTTPClient is still holding onto this connection to www.focus.de server! I noticed this behaviour with other servers, too. I don't have their httpclient.wire logs, but see this from netstat -a: tcp 0 0 77-108-148-98.nyc:34819 www.ananzi.co.za:http CLOSE_WAIT This looks like a HTTPClient bug to me. I don't know enough about the HTTPClient internals yet to fix it and send a diff. Notice that this also implies the that connection/socket timeout fix from a few weeks ago doesn't really work, which is what I observed, too. No matter what timeout I use, if connections do time out they invariably time out after a bit more than 3 minutes which, I presume, is some Java's internal, default socket timeout value. Does anyone know how to fix this 'hang bug'? Thanks, Otis P.S. If it helps, this is the last log line from HttpConnection, which seems to show that readLine() method is the method that got stuck. DEBUG [Crawler-5] org.apache.commons.httpclient.HttpConnection - HttpConnection.readLine(): 06 Jan 2002 00:01:43,881 _________________________________________________________________ iVillage.com: Solutions for Your Life Check out the most exciting women's community on the Web http://www.ivillage.com -- To unsubscribe, e-mail: For additional commands, e-mail: