commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ot...@ivillage.com
Subject Re: bug: HTTPClient hangs when the server doesn't close the connection
Date Sun, 06 Jan 2002 08:43:21 GMT
I wanted a simple way to test HTTPClient with a server whose response I can easily manipulate,
so I got this little perl server to do that for me.

A simple perl server that listens to a port, prints something, and sleeps.

It shows that:

1. HTTPClient does not time out if the server sends something back often enough that there
is never a pause greater than the specified timeout.

2. HTTPClient does time out if the server does not send any data back for a period longer
than the specified timeout.

3. If the server responds without any headers (Status line in particular), but sends something
(e.g. body) back periodically (to satisfy 1.) HTTPClient will keep reading the data forever,
hoping to find the Status line.
This needs to be changed, in my opinion, so that HTTPClient gives up either after some amount
of time, or after it reads a certain number of lines, or after it reads a certain amount of
data and does not find the Status line, or ...


Here is the perl server:

#!/usr/bin/perl -w

use strict;
use IO::Socket;

sub Slayer {
  1 until ( -1 == waitpid(-1,0));
  $SIG{CHLD} = \&Slayer;
}

$SIG{CHLD} = \&Slayer;

my $server = IO::Socket::INET->new(LocalPort => 8080,
                                   Type      => SOCK_STREAM,
                                   Reuse     => 1,
                                   Listen    => 10) or die "Oink $@\n";
my $client;
while ( $client = $server->accept())
{
  next if my $pid = fork;
  die "fork - $!\n" unless defined $pid;
  select $client;
  $| = 1;

  while (1)
  {
    print "Slow server response line\n";
    sleep 20;
  }
}
continue
{
  close($client);
}

Just change/remove the sleep line depending on what and how you want the server to respond.

Ok, now I really need to go to bed or I'll miss dim-sum tomorrow morning.

Otis


On Sat, 05 January 2002, otisg@ivillage.com wrote:

> 
> Hello,
> 
> I don't have a fix for this, but I found a bit more information about the cause related
to the code.
> In the log I found that the problematic method is HttpMethodBase's readStatusLine method,
since the log looks like this:
> 
> DEBUG [Crawler-5] org.apache.commons.httpclient.HttpMethod - HttpMethodBase.readStatusLine(HttpState,HttpConnection):
06 Jan 2002 00:01:42,952
> DEBUG [Crawler-5] org.apache.commons.httpclient.HttpConnection - HttpConnection.readLine():
06 Jan 2002 00:01:42,952
> ...
> ...
> a bunch of log lines showing HttpConnection.readLine() follow.
> 
> The code in the readStatusLine method that I think is the culprit is this:
> 
>         String statusLine = conn.readLine();
> 
>         while(statusLine != null && !statusLine.startsWith("HTTP/")) {
>             statusLine = conn.readLine();
>         }
> 
> 
> This while loop never ends.
> Recall that the problem was that the server responded without any headers and returned
the HTML body right away.
> So this loop will keep reading the body text desperately looking for a line that starts
with "HTTP/", and that line never comes.
> I think that's problem #1.
> 
> I think there is another problem in HttpConnection's readLine() method because, as you
saw in the original message included below, the server stopped sending the body text and started
to hang and HttpConnection did not detect that.
> Instead of closing the connection after a while it just held onto it.
> 
> The relevant portion of readLine() is this:
> 
>         for(;;) {
>             int ch = _input.read();
> //            log.debug("HttpConnection.readLine() read " + ch);
>             if(ch < 0) {
>                 if(buf.length() == 0) {
>                     return null;
>                 } else {
>                     break;
>                 }
>             } else if (ch == '\r') {
> //                log.debug("HttpConnection.readLine() found \\r, continuing");
>                 continue;
>             } else if (ch == '\n') {
> //                log.debug("HttpConnection.readLine() found \\n, breaking");
>                 break;
>             }
>             buf.append((char)ch);
>         }
> 
> _input variable is InputStream gotten from the instance of Socket.
> My guess is that _input.read() is where things got stuck.
> 
> Hm, I just realized I wasn't setting the socket timeout on HttpConnection/Socket like
I thought I was. Uh :(
> Nevertheless, things should not get stuck like this, especially if setting the timeout
is not enforced.
> Although the timeout currently has no effect as onnections are still not timing out after
30 seconds like I'd like them to.
> 
> Otis
> 
> 
> 
> 
> On Sat, 05 January 2002, otisg@ivillage.com wrote:
> 
> > 
> > Hello,
> > 
> > This looks like a bug...
> > I noticed that HTTPClient hangs in some cases.  I turned on HTTPClient logging and
it looks like it likes to hang when the server does not close the connection.
> > Check these logs, they show things pretty clearly.
> > 
> > First, a healthy interaction with a web server.  httpclient.wire logging turns this
up when a normal response is received:
> > 
> > INFO  [Crawler-5] httpclient.wire - >> "GET  HTTP/1.1
> > ": 06 Jan 2002 00:01:42,505
> > INFO  [Crawler-5] httpclient.wire - >> "Host: www.faz.de
> > ": 06 Jan 2002 00:01:42,507
> > INFO  [Crawler-5] httpclient.wire - >> "Content-Length: 0
> > ": 06 Jan 2002 00:01:42,507
> > INFO  [Crawler-5] httpclient.wire - >> "user-agent: None
> > ": 06 Jan 2002 00:01:42,507
> > INFO  [Crawler-5] httpclient.wire - >> \r\n: 06 Jan 2002 00:01:42,508
> > INFO  [Crawler-5] httpclient.wire - << "HTTP/1.1 400 Bad Request" [\r\n]:
06 Jan 2002 00:01:42,702
> > INFO  [Crawler-5] httpclient.wire - << "Server: Microsoft-IIS/5.0" [\r\n]:
06 Jan 2002 00:01:42,703
> > INFO  [Crawler-5] httpclient.wire - << "Date: Sun, 06 Jan 2002 05:01:33 GMT"
[\r\n]: 06 Jan 2002 00:01:42,703
> > INFO  [Crawler-5] httpclient.wire - << "Set-Cookie: BIGipServerfazde=906039724.20480.0000;
path=/" [\r\n]: 06 Jan 2002 00:01:42,704
> > INFO  [Crawler-5] httpclient.wire - << "Content-Type: text/html" [\r\n]: 06
Jan 2002 00:01:42,704
> > INFO  [Crawler-5] httpclient.wire - << "Content-Length: 87" [\r\n]: 06 Jan
2002 00:01:42,705
> > INFO  [Crawler-5] httpclient.wire - << "<html><head><title>Error</title></head><body>The
parameter is incorrect. </body></html>": 06 Jan 2002 00:01:42,713
> > 
> > 
> > The above includes a nice GET request, response headers, and the body.
> > It happens to be a 400 Bad Request, but that doesn't matter here.
> > 
> > 
> > On the other hand, when HTTPClient hung its interaction with the web server looked
something like this:
> > 
> > INFO  [Crawler-5] httpclient.wire - >> "GET  HTTP/1.1
> > ": 06 Jan 2002 00:01:42,950
> > INFO  [Crawler-5] httpclient.wire - >> "Host: www.focus.de
> > ": 06 Jan 2002 00:01:42,951
> > INFO  [Crawler-5] httpclient.wire - >> "Content-Length: 0
> > ": 06 Jan 2002 00:01:42,951
> > INFO  [Crawler-5] httpclient.wire - >> "user-agent: Librisia {OG}/0.1
> > ": 06 Jan 2002 00:01:42,951
> > INFO  [Crawler-5] httpclient.wire - >> \r\n: 06 Jan 2002 00:01:42,952
> > INFO  [Crawler-5] httpclient.wire - << "<!DOCTYPE HTML PUBLIC "-//IETF//DTD
HTML 3.0//EN"> <HTML>" [\r\n]: 06 Jan 2002 00:01:43,477
> > INFO  [Crawler-5] httpclient.wire - << "<HEAD> <TITLE>FOCUS Online
| Rubriken | Inhalt | Fehler</TITLE>" [\r\n]: 06 Jan 2002 00:01:43,478
> > ...
> > more HTML here
> > ...
> > INFO  [Crawler-5] httpclient.wire - << "Da wir unsere Seiten mehrmals t&auml;glich"
[\r\n]: 06 Jan 2002 00:01:43,881
> > 
> > 
> > Notice a few things:
> > 1. There is no response header, the first thing that HTTPClient gets back is the
HTML body.
> > 
> > 2. This server encountered some error (see 'Fehler' in the TITLE - Fehler means
error auf Deutsch), maybe that's why it didn't respond properly.
> > 
> > 3. The very last data sent from the server is some text, but note that it looks
like it was cut off (there are no </BODY> nor </HTML> tags, even though the response
contained a nice, correct HTML up to that point).
> > So it looks like the server choked, started hanging or something, and HTTPClient
stopped there as well and remained listening for the rest of the server's response, which
never came.
> > 
> > Finally:
> > 
> > netstat -a|grep focus
> > tcp        0      0 77-108-148-98.nyc:34555 focus.de:http           ESTABLISHED
> > 
> > 
> > Obviously, HTTPClient is still holding onto this connection to www.focus.de server!
> > 
> > 
> > I noticed this behaviour with other servers, too.  I don't have their httpclient.wire
logs, but see this from netstat -a:
> > 
> > tcp        0      0 77-108-148-98.nyc:34819 www.ananzi.co.za:http   CLOSE_WAIT
> > 
> > 
> > This looks like a HTTPClient bug to me.
> > I don't know enough about the HTTPClient internals yet to fix it and send a diff.
> > Notice that this also implies the that connection/socket timeout fix from a few
weeks ago doesn't really work, which is what I observed, too.  No matter what timeout I use,
if connections do time out they invariably time out after a bit more than 3 minutes which,
I presume, is some Java's internal, default socket timeout value.
> > 
> > Does anyone know how to fix this 'hang bug'?
> > 
> > Thanks,
> > Otis
> > P.S.
> > If it helps, this is the last log line from HttpConnection, which seems to show
that readLine() method is the method that got stuck.
> > 
> > DEBUG [Crawler-5] org.apache.commons.httpclient.HttpConnection - HttpConnection.readLine():
06 Jan 2002 00:01:43,881
> > 
> > _________________________________________________________________
> > iVillage.com: Solutions for Your Life 
> > Check out the most exciting women's community on the Web   
> > http://www.ivillage.com
> > 
> > --
> > To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>
> 
> _________________________________________________________________
> iVillage.com: Solutions for Your Life 
> Check out the most exciting women's community on the Web   
> http://www.ivillage.com
> 
> --
> To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>

_________________________________________________________________
iVillage.com: Solutions for Your Life 
Check out the most exciting women's community on the Web   
http://www.ivillage.com

--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message