hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Kohlschuetter ...@newsclub.de>
Subject Re: Proposal: Configurable HTTP Response length limit
Date Fri, 10 Oct 2003 17:54:05 GMT
I completely agree with you that we all should write standards-compliant HTTP 
web pages/CGI programs/Servlets etc.

Unfortunately, it is not always in our hands. As nearly everybody can write 
PHP scripts today, clients which attempt to read from them, should be 
error-tolerant.

If you want to run a HTTP Client against an arbitrary number of URLs pointing 
to unknown web servers/pages (as in my case - I am writing a web crawler), 
you must be able to guarantee a deadlock-free, fault-tolerant way of reading 
each page. Have you ever heard of "spider traps" ?

Let's have a simple HttpClient command constellation:

public void test() {
HttpClient client = new HttpClient();
HttpMethod m = new GetMethod("http://localhost/testfile.php");
client.executeMethod(m);

// --- bytes limit as suggested in discussion
		InputStream body = m.getResponseBodyAsStream();
		int limit = 10; // limit to first ten bytes
		int i;
		for(i=0;i<=limit;i++) {
			int b = body.read();
			if(b < 0) {
				break;
			}
		}
		System.err.println("EOF at byte "+i);
// ---

m.releaseConnection();
}

The following PHP scripts will cause HttpClient to loop endlessly (1) or to 
hang (2):


Test 1: endless.php
<?php
set_time_limit(-1);
while(TRUE) {
  print "The UNIX time is ".time()."<br>\n";
  flush();
  sleep(1);
}
?>

will cause the program hang at "m.releaseConnection()"


Test 2: hang-in-headers.php
<?php
  // remember to set an adequate memory limit in php.ini
  // or use Apache's "asis"-feature instead of PHP

  $x = str_repeat("X",1024*1024*32); // send 32M of 'X'
  Header("HTTP/1.0 300 Multiple Choices");
  Header("Location: http://localhost/".$x);
?>

In this case you can even remove the getResponseBody()-stuff, it will crash 
with OutOfMemoryError because 32M won't usually fit into the JVM's memory.
-- 
Christian Kohlschütter
ck@newsclub.de

http://www.newsclub.de - Der Meta-Nachrichten-Dienst


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-httpclient-dev-help@jakarta.apache.org


Mime
View raw message