hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Terrell" <Steven.Terr...@guideworkstv.com>
Subject RE: Memory leak using httpclient
Date Wed, 15 Mar 2006 12:39:21 GMT
James,
   Keep in mind that Java memory profilers tend to report what resource
is not being freed, not what is leaking. Your code is holding a
reference to something that it should not.
   I have done some extensive load/performance testing with my
HttpClient based application. After 250 million calls to a Tomcat
servlet, there were no observed memory leaks. That was with HttpClient
3.0rc3, Java 1.5.06.
   Our performance testing also showed that performance slowed down when
our application went past 100 threads. This may be due to a limitation
with the Tomcat instance we were calling. But with 300 threads, I wonder
if you application is spending more time context switching between
threads than real work. This was on a 3.0GHz dual processor machine
running Linux.

--Steve

-----Original Message-----
From: James Ostheimer [mailto:jostheim@alumni.virginia.edu] 
Sent: Tuesday, March 14, 2006 1:53 AM
To: httpclient-user@jakarta.apache.org
Subject: Memory leak using httpclient

Hi-

I am using httpclient in a multi-threaded webcrawler application.  I am
using the MulitThreadedHttpConnectionManager in conjunction with 300
threads that download pages from various sites.

Problem is that I am running out of memory shortly after the process
begins.  I used JProfiler to analyze the memory stacks and it points to:
  a.. 76.2% - 233,587 kB - 6,626 alloc.
org.apache.commons.httpclient.HttpMethod.getResponseBodyAsString 
as the culprit (at most there should be a little over 300 allocations as
there are 300 threads operating at once).  Other relevant information, I
am on a Windows XP Pro platform using the SUN JRE that came with
jdk1.5.0_06.  I am using commons-httpclient-3.0.jar.

Here is the code where I initialize the HttpClient:

private HttpClient httpClient; 
 
 public CrawlerControllerThread(QueueThread qt, MessageReceiver
receiver, int maxThreads, String flag,
   boolean filter, String filterString, String dbType) {
  this.qt = qt;
  this.receiver = receiver;
  this.maxThreads = maxThreads;
  this.flag = flag;
  this.filter = filter;
  this.filterString = filterString;
  this.dbType = dbType;
  threads = new ArrayList();
  lastStatus = new HashMap();
  
  HttpConnectionManagerParams htcmp = new HttpConnectionManagerParams();
  htcmp.setMaxTotalConnections(maxThreads);
  htcmp.setDefaultMaxConnectionsPerHost(10);
  htcmp.setSoTimeout(5000);
  MultiThreadedHttpConnectionManager mtcm = new
MultiThreadedHttpConnectionManager();
  mtcm.setParams(htcmp);
  httpClient = new HttpClient(mtcm);
  
  
 }

The client reference to httpClient is then passed to all the crawling
threads where it is used as follows:

private String getPageApache(URL pageURL, ArrayList unProcessed) {
  SaveURL saveURL = new SaveURL();
  HttpMethod method = null;
  HttpURLConnection urlConnection = null;
  String rawPage = "";
  try {
   method = new GetMethod(pageURL.toExternalForm());
   method.setFollowRedirects(true);
   method.setRequestHeader("Content-type", "text/html");
   int statusCode = httpClient.executeMethod(method);
//   urlConnection = new HttpURLConnection(method,
//     pageURL);
   logger.debug("Requesting: "+pageURL.toExternalForm());

   
   rawPage = method.getResponseBodyAsString();
   //rawPage = saveURL.getURL(urlConnection);
   if(rawPage == null){
    unProcessed.add(pageURL);
   } 
   return rawPage;
  } catch (IllegalArgumentException e) {
   //e.printStackTrace();
   
  } 
  catch (HttpException e) {
   
   //e.printStackTrace();
  } catch (IOException e) {
   unProcessed.add(pageURL);
   //e.printStackTrace();
  }finally {
   if(method != null) {
    method.releaseConnection();
   }
   try {
    if(urlConnection != null) {
     if(urlConnection.getInputStream() != null) {
      urlConnection.getInputStream().close();
     }
    }
   } catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
   }
   urlConnection = null;
   method = null;
  }
  return null;
 }

As you can see, I release the connection in the finally statement, so
that should not be a problem. Upon running the getPageApache above the
returned page as a string is processed and then set to null for garbage
collection. I have been playing with this, closing streams, using
HttpUrlConnection instead of the GetMethod, and I cannot find the
answer.  Indeed it seems the answer does not lie in my code.  

I greatly appreciate any help that anyone can give me, I am at the end
of my ropes with this one.

James

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Mime
View raw message