Return-Path: Delivered-To: apmail-jakarta-httpclient-user-archive@www.apache.org Received: (qmail 4651 invoked from network); 15 Mar 2006 17:27:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 15 Mar 2006 17:27:06 -0000 Received: (qmail 58921 invoked by uid 500); 15 Mar 2006 17:26:55 -0000 Delivered-To: apmail-jakarta-httpclient-user-archive@jakarta.apache.org Received: (qmail 58797 invoked by uid 500); 15 Mar 2006 17:26:55 -0000 Mailing-List: contact httpclient-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: "HttpClient User Discussion" Reply-To: "HttpClient User Discussion" Delivered-To: mailing list httpclient-user@jakarta.apache.org Received: (qmail 58741 invoked by uid 99); 15 Mar 2006 17:26:55 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Mar 2006 09:26:54 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [195.186.19.66] (HELO mail22.bluewin.ch) (195.186.19.66) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Mar 2006 09:26:53 -0800 Received: from [192.168.0.4] (62.202.57.196) by mail22.bluewin.ch (Bluewin 7.2.071) id 43E4A9E200A3FF60 for httpclient-user@jakarta.apache.org; Wed, 15 Mar 2006 17:26:32 +0000 Subject: Re: Memory leak using httpclient From: Oleg Kalnichevski To: HttpClient User Discussion In-Reply-To: <004f01c6484a$f802a320$0b01a8c0@wilco> References: <19FBA0BAF7AD2F4081C0FE36832A6E84511931@GWDC04.GuideWorks.TV> <004f01c6484a$f802a320$0b01a8c0@wilco> Content-Type: text/plain Date: Wed, 15 Mar 2006 18:26:30 +0100 Message-Id: <1142443590.8683.35.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.4.1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Wed, 2006-03-15 at 11:10 -0500, James Ostheimer wrote: > Hi- > > Thanks to everyone for the help in trying to figure this out. > > Indeed everyone is correct and the problem, as nefarious as it is, does not > seem to be in HttpClient. Unfortunately (for me) the garbage collection of > StringBuilders (or Buffers, decided to use Builders and 1.5) that are turned > into Strings seems to be extremely slow. What I mean is that many old > allocations seem to be allowed to hang around for quite a while before being > garbage collected, despite the fact that they aren't used any more (they are > in fact nulled in my code). I have observed the heap size grow and then > fall off a cliff once the garbage collector finally decides it can clean up > those instances. > > One thing that really did help (the nulled instances were not being > collected at all before this) was removing any stored references to the > crawler threads. I was keeping a reference to each running thread in a > controller class to compute statistics on how well I was doing (download > speed). When I removed the reference so that each thread was completely > dereferenced (on its own) the memory started going up much slower. > > If anyone has any suggestions (maybe making the garbage collector more > aggessive?), I would love to hear them. I do want to apologize for bringing > up this problem as it turned out not to be an HttpClient problem, and thank > everyone for their help. > James, I believe you should approach the problem from a different angle. Instead of trying to make GC more aggressive, consider revising your code to reduce the amount of garbage it produces. Oleg > Thanks > > James > > ----- Original Message ----- > From: "Steve Terrell" > To: "HttpClient User Discussion" > Sent: Wednesday, March 15, 2006 7:39 AM > Subject: RE: Memory leak using httpclient > > > James, > Keep in mind that Java memory profilers tend to report what resource > is not being freed, not what is leaking. Your code is holding a > reference to something that it should not. > I have done some extensive load/performance testing with my > HttpClient based application. After 250 million calls to a Tomcat > servlet, there were no observed memory leaks. That was with HttpClient > 3.0rc3, Java 1.5.06. > Our performance testing also showed that performance slowed down when > our application went past 100 threads. This may be due to a limitation > with the Tomcat instance we were calling. But with 300 threads, I wonder > if you application is spending more time context switching between > threads than real work. This was on a 3.0GHz dual processor machine > running Linux. > > --Steve > > -----Original Message----- > From: James Ostheimer [mailto:jostheim@alumni.virginia.edu] > Sent: Tuesday, March 14, 2006 1:53 AM > To: httpclient-user@jakarta.apache.org > Subject: Memory leak using httpclient > > Hi- > > I am using httpclient in a multi-threaded webcrawler application. I am > using the MulitThreadedHttpConnectionManager in conjunction with 300 > threads that download pages from various sites. > > Problem is that I am running out of memory shortly after the process > begins. I used JProfiler to analyze the memory stacks and it points to: > a.. 76.2% - 233,587 kB - 6,626 alloc. > org.apache.commons.httpclient.HttpMethod.getResponseBodyAsString > as the culprit (at most there should be a little over 300 allocations as > there are 300 threads operating at once). Other relevant information, I > am on a Windows XP Pro platform using the SUN JRE that came with > jdk1.5.0_06. I am using commons-httpclient-3.0.jar. > > Here is the code where I initialize the HttpClient: > > private HttpClient httpClient; > > public CrawlerControllerThread(QueueThread qt, MessageReceiver > receiver, int maxThreads, String flag, > boolean filter, String filterString, String dbType) { > this.qt = qt; > this.receiver = receiver; > this.maxThreads = maxThreads; > this.flag = flag; > this.filter = filter; > this.filterString = filterString; > this.dbType = dbType; > threads = new ArrayList(); > lastStatus = new HashMap(); > > HttpConnectionManagerParams htcmp = new HttpConnectionManagerParams(); > htcmp.setMaxTotalConnections(maxThreads); > htcmp.setDefaultMaxConnectionsPerHost(10); > htcmp.setSoTimeout(5000); > MultiThreadedHttpConnectionManager mtcm = new > MultiThreadedHttpConnectionManager(); > mtcm.setParams(htcmp); > httpClient = new HttpClient(mtcm); > > > } > > The client reference to httpClient is then passed to all the crawling > threads where it is used as follows: > > private String getPageApache(URL pageURL, ArrayList unProcessed) { > SaveURL saveURL = new SaveURL(); > HttpMethod method = null; > HttpURLConnection urlConnection = null; > String rawPage = ""; > try { > method = new GetMethod(pageURL.toExternalForm()); > method.setFollowRedirects(true); > method.setRequestHeader("Content-type", "text/html"); > int statusCode = httpClient.executeMethod(method); > // urlConnection = new HttpURLConnection(method, > // pageURL); > logger.debug("Requesting: "+pageURL.toExternalForm()); > > > rawPage = method.getResponseBodyAsString(); > //rawPage = saveURL.getURL(urlConnection); > if(rawPage == null){ > unProcessed.add(pageURL); > } > return rawPage; > } catch (IllegalArgumentException e) { > //e.printStackTrace(); > > } > catch (HttpException e) { > > //e.printStackTrace(); > } catch (IOException e) { > unProcessed.add(pageURL); > //e.printStackTrace(); > }finally { > if(method != null) { > method.releaseConnection(); > } > try { > if(urlConnection != null) { > if(urlConnection.getInputStream() != null) { > urlConnection.getInputStream().close(); > } > } > } catch (IOException e) { > // TODO Auto-generated catch block > e.printStackTrace(); > } > urlConnection = null; > method = null; > } > return null; > } > > As you can see, I release the connection in the finally statement, so > that should not be a problem. Upon running the getPageApache above the > returned page as a string is processed and then set to null for garbage > collection. I have been playing with this, closing streams, using > HttpUrlConnection instead of the GetMethod, and I cannot find the > answer. Indeed it seems the answer does not lie in my code. > > I greatly appreciate any help that anyone can give me, I am at the end > of my ropes with this one. > > James > > --------------------------------------------------------------------- > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: httpclient-user-help@jakarta.apache.org