hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Ostheimer" <josth...@alumni.virginia.edu>
Subject Memory leak using httpclient
Date Tue, 14 Mar 2006 06:52:34 GMT
Hi-

I am using httpclient in a multi-threaded webcrawler application.  I am using the MulitThreadedHttpConnectionManager
in conjunction with 300 threads that download pages from various sites.

Problem is that I am running out of memory shortly after the process begins.  I used JProfiler
to analyze the memory stacks and it points to:
  a.. 76.2% - 233,587 kB - 6,626 alloc. org.apache.commons.httpclient.HttpMethod.getResponseBodyAsString

as the culprit (at most there should be a little over 300 allocations as there are 300 threads
operating at once).  Other relevant information, I am on a Windows XP Pro platform using the
SUN JRE that came with jdk1.5.0_06.  I am using commons-httpclient-3.0.jar.

Here is the code where I initialize the HttpClient:

private HttpClient httpClient; 
 
 public CrawlerControllerThread(QueueThread qt, MessageReceiver receiver, int maxThreads,
String flag,
   boolean filter, String filterString, String dbType) {
  this.qt = qt;
  this.receiver = receiver;
  this.maxThreads = maxThreads;
  this.flag = flag;
  this.filter = filter;
  this.filterString = filterString;
  this.dbType = dbType;
  threads = new ArrayList();
  lastStatus = new HashMap();
  
  HttpConnectionManagerParams htcmp = new HttpConnectionManagerParams();
  htcmp.setMaxTotalConnections(maxThreads);
  htcmp.setDefaultMaxConnectionsPerHost(10);
  htcmp.setSoTimeout(5000);
  MultiThreadedHttpConnectionManager mtcm = new MultiThreadedHttpConnectionManager();
  mtcm.setParams(htcmp);
  httpClient = new HttpClient(mtcm);
  
  
 }

The client reference to httpClient is then passed to all the crawling threads where it is
used as follows:

private String getPageApache(URL pageURL, ArrayList unProcessed) {
  SaveURL saveURL = new SaveURL();
  HttpMethod method = null;
  HttpURLConnection urlConnection = null;
  String rawPage = "";
  try {
   method = new GetMethod(pageURL.toExternalForm());
   method.setFollowRedirects(true);
   method.setRequestHeader("Content-type", "text/html");
   int statusCode = httpClient.executeMethod(method);
//   urlConnection = new HttpURLConnection(method,
//     pageURL);
   logger.debug("Requesting: "+pageURL.toExternalForm());

   
   rawPage = method.getResponseBodyAsString();
   //rawPage = saveURL.getURL(urlConnection);
   if(rawPage == null){
    unProcessed.add(pageURL);
   } 
   return rawPage;
  } catch (IllegalArgumentException e) {
   //e.printStackTrace();
   
  } 
  catch (HttpException e) {
   
   //e.printStackTrace();
  } catch (IOException e) {
   unProcessed.add(pageURL);
   //e.printStackTrace();
  }finally {
   if(method != null) {
    method.releaseConnection();
   }
   try {
    if(urlConnection != null) {
     if(urlConnection.getInputStream() != null) {
      urlConnection.getInputStream().close();
     }
    }
   } catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
   }
   urlConnection = null;
   method = null;
  }
  return null;
 }

As you can see, I release the connection in the finally statement, so that should not be a
problem. Upon running the getPageApache above the returned page as a string is processed and
then set to null for garbage collection. I have been playing with this, closing streams, using
HttpUrlConnection instead of the GetMethod, and I cannot find the answer.  Indeed it seems
the answer does not lie in my code.  

I greatly appreciate any help that anyone can give me, I am at the end of my ropes with this
one.

James

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message