commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ot...@ivillage.com
Subject Re: bug: HTTPClient hangs when the server doesn't close the connection
Date Sun, 06 Jan 2002 07:32:01 GMT
Hello,

I don't have a fix for this, but I found a bit more information about the cause related to
the code.
In the log I found that the problematic method is HttpMethodBase's readStatusLine method,
since the log looks like this:

DEBUG [Crawler-5] org.apache.commons.httpclient.HttpMethod - HttpMethodBase.readStatusLine(HttpState,HttpConnection):
06 Jan 2002 00:01:42,952
DEBUG [Crawler-5] org.apache.commons.httpclient.HttpConnection - HttpConnection.readLine():
06 Jan 2002 00:01:42,952
...
...
a bunch of log lines showing HttpConnection.readLine() follow.

The code in the readStatusLine method that I think is the culprit is this:

        String statusLine = conn.readLine();

        while(statusLine != null && !statusLine.startsWith("HTTP/")) {
            statusLine = conn.readLine();
        }


This while loop never ends.
Recall that the problem was that the server responded without any headers and returned the
HTML body right away.
So this loop will keep reading the body text desperately looking for a line that starts with
"HTTP/", and that line never comes.
I think that's problem #1.

I think there is another problem in HttpConnection's readLine() method because, as you saw
in the original message included below, the server stopped sending the body text and started
to hang and HttpConnection did not detect that.
Instead of closing the connection after a while it just held onto it.

The relevant portion of readLine() is this:

        for(;;) {
            int ch = _input.read();
//            log.debug("HttpConnection.readLine() read " + ch);
            if(ch < 0) {
                if(buf.length() == 0) {
                    return null;
                } else {
                    break;
                }
            } else if (ch == '\r') {
//                log.debug("HttpConnection.readLine() found \\r, continuing");
                continue;
            } else if (ch == '\n') {
//                log.debug("HttpConnection.readLine() found \\n, breaking");
                break;
            }
            buf.append((char)ch);
        }

_input variable is InputStream gotten from the instance of Socket.
My guess is that _input.read() is where things got stuck.

Hm, I just realized I wasn't setting the socket timeout on HttpConnection/Socket like I thought
I was. Uh :(
Nevertheless, things should not get stuck like this, especially if setting the timeout is
not enforced.
Although the timeout currently has no effect as onnections are still not timing out after
30 seconds like I'd like them to.

Otis




On Sat, 05 January 2002, otisg@ivillage.com wrote:

> 
> Hello,
> 
> This looks like a bug...
> I noticed that HTTPClient hangs in some cases.  I turned on HTTPClient logging and it
looks like it likes to hang when the server does not close the connection.
> Check these logs, they show things pretty clearly.
> 
> First, a healthy interaction with a web server.  httpclient.wire logging turns this up
when a normal response is received:
> 
> INFO  [Crawler-5] httpclient.wire - >> "GET  HTTP/1.1
> ": 06 Jan 2002 00:01:42,505
> INFO  [Crawler-5] httpclient.wire - >> "Host: www.faz.de
> ": 06 Jan 2002 00:01:42,507
> INFO  [Crawler-5] httpclient.wire - >> "Content-Length: 0
> ": 06 Jan 2002 00:01:42,507
> INFO  [Crawler-5] httpclient.wire - >> "user-agent: None
> ": 06 Jan 2002 00:01:42,507
> INFO  [Crawler-5] httpclient.wire - >> \r\n: 06 Jan 2002 00:01:42,508
> INFO  [Crawler-5] httpclient.wire - << "HTTP/1.1 400 Bad Request" [\r\n]: 06 Jan
2002 00:01:42,702
> INFO  [Crawler-5] httpclient.wire - << "Server: Microsoft-IIS/5.0" [\r\n]: 06 Jan
2002 00:01:42,703
> INFO  [Crawler-5] httpclient.wire - << "Date: Sun, 06 Jan 2002 05:01:33 GMT" [\r\n]:
06 Jan 2002 00:01:42,703
> INFO  [Crawler-5] httpclient.wire - << "Set-Cookie: BIGipServerfazde=906039724.20480.0000;
path=/" [\r\n]: 06 Jan 2002 00:01:42,704
> INFO  [Crawler-5] httpclient.wire - << "Content-Type: text/html" [\r\n]: 06 Jan
2002 00:01:42,704
> INFO  [Crawler-5] httpclient.wire - << "Content-Length: 87" [\r\n]: 06 Jan 2002
00:01:42,705
> INFO  [Crawler-5] httpclient.wire - << "<html><head><title>Error</title></head><body>The
parameter is incorrect. </body></html>": 06 Jan 2002 00:01:42,713
> 
> 
> The above includes a nice GET request, response headers, and the body.
> It happens to be a 400 Bad Request, but that doesn't matter here.
> 
> 
> On the other hand, when HTTPClient hung its interaction with the web server looked something
like this:
> 
> INFO  [Crawler-5] httpclient.wire - >> "GET  HTTP/1.1
> ": 06 Jan 2002 00:01:42,950
> INFO  [Crawler-5] httpclient.wire - >> "Host: www.focus.de
> ": 06 Jan 2002 00:01:42,951
> INFO  [Crawler-5] httpclient.wire - >> "Content-Length: 0
> ": 06 Jan 2002 00:01:42,951
> INFO  [Crawler-5] httpclient.wire - >> "user-agent: Librisia {OG}/0.1
> ": 06 Jan 2002 00:01:42,951
> INFO  [Crawler-5] httpclient.wire - >> \r\n: 06 Jan 2002 00:01:42,952
> INFO  [Crawler-5] httpclient.wire - << "<!DOCTYPE HTML PUBLIC "-//IETF//DTD
HTML 3.0//EN"> <HTML>" [\r\n]: 06 Jan 2002 00:01:43,477
> INFO  [Crawler-5] httpclient.wire - << "<HEAD> <TITLE>FOCUS Online
| Rubriken | Inhalt | Fehler</TITLE>" [\r\n]: 06 Jan 2002 00:01:43,478
> ...
> more HTML here
> ...
> INFO  [Crawler-5] httpclient.wire - << "Da wir unsere Seiten mehrmals t&auml;glich"
[\r\n]: 06 Jan 2002 00:01:43,881
> 
> 
> Notice a few things:
> 1. There is no response header, the first thing that HTTPClient gets back is the HTML
body.
> 
> 2. This server encountered some error (see 'Fehler' in the TITLE - Fehler means error
auf Deutsch), maybe that's why it didn't respond properly.
> 
> 3. The very last data sent from the server is some text, but note that it looks like
it was cut off (there are no </BODY> nor </HTML> tags, even though the response
contained a nice, correct HTML up to that point).
> So it looks like the server choked, started hanging or something, and HTTPClient stopped
there as well and remained listening for the rest of the server's response, which never came.
> 
> Finally:
> 
> netstat -a|grep focus
> tcp        0      0 77-108-148-98.nyc:34555 focus.de:http           ESTABLISHED
> 
> 
> Obviously, HTTPClient is still holding onto this connection to www.focus.de server!
> 
> 
> I noticed this behaviour with other servers, too.  I don't have their httpclient.wire
logs, but see this from netstat -a:
> 
> tcp        0      0 77-108-148-98.nyc:34819 www.ananzi.co.za:http   CLOSE_WAIT
> 
> 
> This looks like a HTTPClient bug to me.
> I don't know enough about the HTTPClient internals yet to fix it and send a diff.
> Notice that this also implies the that connection/socket timeout fix from a few weeks
ago doesn't really work, which is what I observed, too.  No matter what timeout I use, if
connections do time out they invariably time out after a bit more than 3 minutes which, I
presume, is some Java's internal, default socket timeout value.
> 
> Does anyone know how to fix this 'hang bug'?
> 
> Thanks,
> Otis
> P.S.
> If it helps, this is the last log line from HttpConnection, which seems to show that
readLine() method is the method that got stuck.
> 
> DEBUG [Crawler-5] org.apache.commons.httpclient.HttpConnection - HttpConnection.readLine():
06 Jan 2002 00:01:43,881
> 
> _________________________________________________________________
> iVillage.com: Solutions for Your Life 
> Check out the most exciting women's community on the Web   
> http://www.ivillage.com
> 
> --
> To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>

_________________________________________________________________
iVillage.com: Solutions for Your Life 
Check out the most exciting women's community on the Web   
http://www.ivillage.com

--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message