hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Kohlschuetter ...@newsclub.de>
Subject Re: Proposal: Configurable HTTP Response length limit
Date Fri, 10 Oct 2003 17:54:05 GMT
I completely agree with you that we all should write standards-compliant HTTP 
web pages/CGI programs/Servlets etc.

Unfortunately, it is not always in our hands. As nearly everybody can write 
PHP scripts today, clients which attempt to read from them, should be 

If you want to run a HTTP Client against an arbitrary number of URLs pointing 
to unknown web servers/pages (as in my case - I am writing a web crawler), 
you must be able to guarantee a deadlock-free, fault-tolerant way of reading 
each page. Have you ever heard of "spider traps" ?

Let's have a simple HttpClient command constellation:

public void test() {
HttpClient client = new HttpClient();
HttpMethod m = new GetMethod("http://localhost/testfile.php");

// --- bytes limit as suggested in discussion
		InputStream body = m.getResponseBodyAsStream();
		int limit = 10; // limit to first ten bytes
		int i;
		for(i=0;i<=limit;i++) {
			int b = body.read();
			if(b < 0) {
		System.err.println("EOF at byte "+i);
// ---


The following PHP scripts will cause HttpClient to loop endlessly (1) or to 
hang (2):

Test 1: endless.php
while(TRUE) {
  print "The UNIX time is ".time()."<br>\n";

will cause the program hang at "m.releaseConnection()"

Test 2: hang-in-headers.php
  // remember to set an adequate memory limit in php.ini
  // or use Apache's "asis"-feature instead of PHP

  $x = str_repeat("X",1024*1024*32); // send 32M of 'X'
  Header("HTTP/1.0 300 Multiple Choices");
  Header("Location: http://localhost/".$x);

In this case you can even remove the getResponseBody()-stuff, it will crash 
with OutOfMemoryError because 32M won't usually fit into the JVM's memory.
Christian Kohlschütter

http://www.newsclub.de - Der Meta-Nachrichten-Dienst

To unsubscribe, e-mail: commons-httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-httpclient-dev-help@jakarta.apache.org

View raw message