hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: Memory leak using httpclient
Date Wed, 15 Mar 2006 17:26:30 GMT
On Wed, 2006-03-15 at 11:10 -0500, James Ostheimer wrote:
> Hi-
> 
> Thanks to everyone for the help in trying to figure this out.
> 
> Indeed everyone is correct and the problem, as nefarious as it is, does not 
> seem to be in HttpClient.  Unfortunately (for me) the garbage collection of 
> StringBuilders (or Buffers, decided to use Builders and 1.5) that are turned 
> into Strings seems to be extremely slow.  What I mean is that many old 
> allocations seem to be allowed to hang around for quite a while before being 
> garbage collected, despite the fact that they aren't used any more (they are 
> in fact nulled in my code).  I have observed the heap size grow and then 
> fall off a cliff once the garbage collector finally decides it can clean up 
> those instances.
> 
> One thing that really did help (the nulled instances were not being 
> collected at all before this) was removing any stored references to the 
> crawler threads.  I was keeping a reference to each running thread in a 
> controller class to compute statistics on how well I was doing (download 
> speed).  When I removed the reference so that each thread was completely 
> dereferenced (on its own) the memory started going up much slower.
> 
> If anyone has any suggestions (maybe making the garbage collector more 
> aggessive?), I would love to hear them.  I do want to apologize for bringing 
> up this problem as it turned out not to be an HttpClient problem, and thank 
> everyone for their help.
> 

James,

I believe you should approach the problem from a different angle.
Instead of trying to make GC more aggressive, consider revising your
code to reduce the amount of garbage it produces.  

Oleg

> Thanks
> 
> James
> 
> ----- Original Message ----- 
> From: "Steve Terrell" <Steven.Terrell@guideworkstv.com>
> To: "HttpClient User Discussion" <httpclient-user@jakarta.apache.org>
> Sent: Wednesday, March 15, 2006 7:39 AM
> Subject: RE: Memory leak using httpclient
> 
> 
> James,
>    Keep in mind that Java memory profilers tend to report what resource
> is not being freed, not what is leaking. Your code is holding a
> reference to something that it should not.
>    I have done some extensive load/performance testing with my
> HttpClient based application. After 250 million calls to a Tomcat
> servlet, there were no observed memory leaks. That was with HttpClient
> 3.0rc3, Java 1.5.06.
>    Our performance testing also showed that performance slowed down when
> our application went past 100 threads. This may be due to a limitation
> with the Tomcat instance we were calling. But with 300 threads, I wonder
> if you application is spending more time context switching between
> threads than real work. This was on a 3.0GHz dual processor machine
> running Linux.
> 
> --Steve
> 
> -----Original Message-----
> From: James Ostheimer [mailto:jostheim@alumni.virginia.edu]
> Sent: Tuesday, March 14, 2006 1:53 AM
> To: httpclient-user@jakarta.apache.org
> Subject: Memory leak using httpclient
> 
> Hi-
> 
> I am using httpclient in a multi-threaded webcrawler application.  I am
> using the MulitThreadedHttpConnectionManager in conjunction with 300
> threads that download pages from various sites.
> 
> Problem is that I am running out of memory shortly after the process
> begins.  I used JProfiler to analyze the memory stacks and it points to:
>   a.. 76.2% - 233,587 kB - 6,626 alloc.
> org.apache.commons.httpclient.HttpMethod.getResponseBodyAsString
> as the culprit (at most there should be a little over 300 allocations as
> there are 300 threads operating at once).  Other relevant information, I
> am on a Windows XP Pro platform using the SUN JRE that came with
> jdk1.5.0_06.  I am using commons-httpclient-3.0.jar.
> 
> Here is the code where I initialize the HttpClient:
> 
> private HttpClient httpClient;
> 
>  public CrawlerControllerThread(QueueThread qt, MessageReceiver
> receiver, int maxThreads, String flag,
>    boolean filter, String filterString, String dbType) {
>   this.qt = qt;
>   this.receiver = receiver;
>   this.maxThreads = maxThreads;
>   this.flag = flag;
>   this.filter = filter;
>   this.filterString = filterString;
>   this.dbType = dbType;
>   threads = new ArrayList();
>   lastStatus = new HashMap();
> 
>   HttpConnectionManagerParams htcmp = new HttpConnectionManagerParams();
>   htcmp.setMaxTotalConnections(maxThreads);
>   htcmp.setDefaultMaxConnectionsPerHost(10);
>   htcmp.setSoTimeout(5000);
>   MultiThreadedHttpConnectionManager mtcm = new
> MultiThreadedHttpConnectionManager();
>   mtcm.setParams(htcmp);
>   httpClient = new HttpClient(mtcm);
> 
> 
>  }
> 
> The client reference to httpClient is then passed to all the crawling
> threads where it is used as follows:
> 
> private String getPageApache(URL pageURL, ArrayList unProcessed) {
>   SaveURL saveURL = new SaveURL();
>   HttpMethod method = null;
>   HttpURLConnection urlConnection = null;
>   String rawPage = "";
>   try {
>    method = new GetMethod(pageURL.toExternalForm());
>    method.setFollowRedirects(true);
>    method.setRequestHeader("Content-type", "text/html");
>    int statusCode = httpClient.executeMethod(method);
> //   urlConnection = new HttpURLConnection(method,
> //     pageURL);
>    logger.debug("Requesting: "+pageURL.toExternalForm());
> 
> 
>    rawPage = method.getResponseBodyAsString();
>    //rawPage = saveURL.getURL(urlConnection);
>    if(rawPage == null){
>     unProcessed.add(pageURL);
>    }
>    return rawPage;
>   } catch (IllegalArgumentException e) {
>    //e.printStackTrace();
> 
>   }
>   catch (HttpException e) {
> 
>    //e.printStackTrace();
>   } catch (IOException e) {
>    unProcessed.add(pageURL);
>    //e.printStackTrace();
>   }finally {
>    if(method != null) {
>     method.releaseConnection();
>    }
>    try {
>     if(urlConnection != null) {
>      if(urlConnection.getInputStream() != null) {
>       urlConnection.getInputStream().close();
>      }
>     }
>    } catch (IOException e) {
>     // TODO Auto-generated catch block
>     e.printStackTrace();
>    }
>    urlConnection = null;
>    method = null;
>   }
>   return null;
>  }
> 
> As you can see, I release the connection in the finally statement, so
> that should not be a problem. Upon running the getPageApache above the
> returned page as a string is processed and then set to null for garbage
> collection. I have been playing with this, closing streams, using
> HttpUrlConnection instead of the GetMethod, and I cannot find the
> answer.  Indeed it seems the answer does not lie in my code.
> 
> I greatly appreciate any help that anyone can give me, I am at the end
> of my ropes with this one.
> 
> James
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Mime
View raw message