hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Ostheimer" <josth...@alumni.virginia.edu>
Subject Re: Memory leak using httpclient
Date Wed, 15 Mar 2006 16:10:24 GMT
Hi-

Thanks to everyone for the help in trying to figure this out.

Indeed everyone is correct and the problem, as nefarious as it is, does not 
seem to be in HttpClient.  Unfortunately (for me) the garbage collection of 
StringBuilders (or Buffers, decided to use Builders and 1.5) that are turned 
into Strings seems to be extremely slow.  What I mean is that many old 
allocations seem to be allowed to hang around for quite a while before being 
garbage collected, despite the fact that they aren't used any more (they are 
in fact nulled in my code).  I have observed the heap size grow and then 
fall off a cliff once the garbage collector finally decides it can clean up 
those instances.

One thing that really did help (the nulled instances were not being 
collected at all before this) was removing any stored references to the 
crawler threads.  I was keeping a reference to each running thread in a 
controller class to compute statistics on how well I was doing (download 
speed).  When I removed the reference so that each thread was completely 
dereferenced (on its own) the memory started going up much slower.

If anyone has any suggestions (maybe making the garbage collector more 
aggessive?), I would love to hear them.  I do want to apologize for bringing 
up this problem as it turned out not to be an HttpClient problem, and thank 
everyone for their help.

Thanks

James

----- Original Message ----- 
From: "Steve Terrell" <Steven.Terrell@guideworkstv.com>
To: "HttpClient User Discussion" <httpclient-user@jakarta.apache.org>
Sent: Wednesday, March 15, 2006 7:39 AM
Subject: RE: Memory leak using httpclient


James,
   Keep in mind that Java memory profilers tend to report what resource
is not being freed, not what is leaking. Your code is holding a
reference to something that it should not.
   I have done some extensive load/performance testing with my
HttpClient based application. After 250 million calls to a Tomcat
servlet, there were no observed memory leaks. That was with HttpClient
3.0rc3, Java 1.5.06.
   Our performance testing also showed that performance slowed down when
our application went past 100 threads. This may be due to a limitation
with the Tomcat instance we were calling. But with 300 threads, I wonder
if you application is spending more time context switching between
threads than real work. This was on a 3.0GHz dual processor machine
running Linux.

--Steve

-----Original Message-----
From: James Ostheimer [mailto:jostheim@alumni.virginia.edu]
Sent: Tuesday, March 14, 2006 1:53 AM
To: httpclient-user@jakarta.apache.org
Subject: Memory leak using httpclient

Hi-

I am using httpclient in a multi-threaded webcrawler application.  I am
using the MulitThreadedHttpConnectionManager in conjunction with 300
threads that download pages from various sites.

Problem is that I am running out of memory shortly after the process
begins.  I used JProfiler to analyze the memory stacks and it points to:
  a.. 76.2% - 233,587 kB - 6,626 alloc.
org.apache.commons.httpclient.HttpMethod.getResponseBodyAsString
as the culprit (at most there should be a little over 300 allocations as
there are 300 threads operating at once).  Other relevant information, I
am on a Windows XP Pro platform using the SUN JRE that came with
jdk1.5.0_06.  I am using commons-httpclient-3.0.jar.

Here is the code where I initialize the HttpClient:

private HttpClient httpClient;

 public CrawlerControllerThread(QueueThread qt, MessageReceiver
receiver, int maxThreads, String flag,
   boolean filter, String filterString, String dbType) {
  this.qt = qt;
  this.receiver = receiver;
  this.maxThreads = maxThreads;
  this.flag = flag;
  this.filter = filter;
  this.filterString = filterString;
  this.dbType = dbType;
  threads = new ArrayList();
  lastStatus = new HashMap();

  HttpConnectionManagerParams htcmp = new HttpConnectionManagerParams();
  htcmp.setMaxTotalConnections(maxThreads);
  htcmp.setDefaultMaxConnectionsPerHost(10);
  htcmp.setSoTimeout(5000);
  MultiThreadedHttpConnectionManager mtcm = new
MultiThreadedHttpConnectionManager();
  mtcm.setParams(htcmp);
  httpClient = new HttpClient(mtcm);


 }

The client reference to httpClient is then passed to all the crawling
threads where it is used as follows:

private String getPageApache(URL pageURL, ArrayList unProcessed) {
  SaveURL saveURL = new SaveURL();
  HttpMethod method = null;
  HttpURLConnection urlConnection = null;
  String rawPage = "";
  try {
   method = new GetMethod(pageURL.toExternalForm());
   method.setFollowRedirects(true);
   method.setRequestHeader("Content-type", "text/html");
   int statusCode = httpClient.executeMethod(method);
//   urlConnection = new HttpURLConnection(method,
//     pageURL);
   logger.debug("Requesting: "+pageURL.toExternalForm());


   rawPage = method.getResponseBodyAsString();
   //rawPage = saveURL.getURL(urlConnection);
   if(rawPage == null){
    unProcessed.add(pageURL);
   }
   return rawPage;
  } catch (IllegalArgumentException e) {
   //e.printStackTrace();

  }
  catch (HttpException e) {

   //e.printStackTrace();
  } catch (IOException e) {
   unProcessed.add(pageURL);
   //e.printStackTrace();
  }finally {
   if(method != null) {
    method.releaseConnection();
   }
   try {
    if(urlConnection != null) {
     if(urlConnection.getInputStream() != null) {
      urlConnection.getInputStream().close();
     }
    }
   } catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
   }
   urlConnection = null;
   method = null;
  }
  return null;
 }

As you can see, I release the connection in the finally statement, so
that should not be a problem. Upon running the getPageApache above the
returned page as a string is processed and then set to null for garbage
collection. I have been playing with this, closing streams, using
HttpUrlConnection instead of the GetMethod, and I cannot find the
answer.  Indeed it seems the answer does not lie in my code.

I greatly appreciate any help that anyone can give me, I am at the end
of my ropes with this one.

James

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Mime
View raw message