Return-Path: Delivered-To: apmail-hc-dev-archive@www.apache.org Received: (qmail 45960 invoked from network); 15 Jun 2009 22:42:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Jun 2009 22:42:03 -0000 Received: (qmail 74992 invoked by uid 500); 15 Jun 2009 22:42:14 -0000 Delivered-To: apmail-hc-dev-archive@hc.apache.org Received: (qmail 74931 invoked by uid 500); 15 Jun 2009 22:42:14 -0000 Mailing-List: contact dev-help@hc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "HttpComponents Project" Delivered-To: mailing list dev@hc.apache.org Received: (qmail 74920 invoked by uid 99); 15 Jun 2009 22:42:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Jun 2009 22:42:14 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: unknown ipv4:213.46.255.15/27 (athena.apache.org: encountered unrecognized mechanism during SPF processing of domain of odi@odi.ch) Received: from [62.179.121.37] (HELO viefep17-int.chello.at) (62.179.121.37) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Jun 2009 22:42:04 +0000 Received: from edge03.upc.biz ([192.168.13.238]) by viefep17-int.chello.at (InterMail vM.7.09.01.00 201-2219-108-20080618) with ESMTP id <20090615224142.JDLQ16461.viefep17-int.chello.at@edge03.upc.biz> for ; Tue, 16 Jun 2009 00:41:42 +0200 Received: from gollum.odi.ch ([77.56.180.60]) by edge03.upc.biz with edge id 4Nhf1c09E1Jb14D03NhhVB; Tue, 16 Jun 2009 00:41:42 +0200 X-SourceIP: 77.56.180.60 Received: from [192.168.111.10] (mithril [192.168.111.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by gollum.odi.ch (Postfix) with ESMTPSA id 26D851E068 for ; Tue, 16 Jun 2009 00:41:39 +0200 (CEST) Message-ID: <4A36CE22.9040804@odi.ch> Date: Tue, 16 Jun 2009 00:41:38 +0200 From: =?ISO-8859-1?Q?Ortwin_Gl=FCck?= User-Agent: Thunderbird 2.0.0.21 (X11/20090321) MIME-Version: 1.0 To: HttpComponents Project Subject: Re: Question on HTTPClient-675 References: <4A368361.9090405@odi.ch> <4A369EE4.8030409@odi.ch> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org D H wrote: > > I agree about this, they wanted proof of a fix before any changes would be > made and my manager is still saying there must be proof before code changes > even now after I showed him your email and the documentation from the site. Nice way of thinking -- at a university where time and money are infinite resources. Wishful thinking anywhere else. Enter The Real World. >> Anyway if this is so hard to reproduce and profiling didn't give you any >> idea what's broken, why do you suspect something in this particular piece of >> code is the cause of your problem? > > > That's a very good question, I was given the source code, no explanation of > their code and told HttpClient was the problem. Why am I not convinced? But it's well possible. My best advice I can give is: Upgrade for God's sake, and fix obvious mistakes in the use of the API. Use some monitoring tools like JConsole / JVisualVM, jmap, netstat, top and the like to see if you have a garbage collection problem or any obvious resource leaks, then take appropriate action. > It is a rather large > codebase so I'm taking their word for it. They've apparently worked on it > sporadically for a couple of months to isolate it to HttpClient, and my task > is to prove it is causing this problem with JMeter in a self-contained code > sample. This problem has only been seen in Production and only after almost > a week of running 24/7 so it's hard to duplicate it easily. I've sent over > two hundred thousand HttpClient requests without seeing the problem so I'd > rather see this code fix go to Production and test that way personally. Sounds like a a typical production problem to me: it can take weeks to see it, there is no way you can trigger it on purpose and maybe it has only ever been encountered on the production system. Face it, you will not reproduce it locally in reasonable time. It is maybe dependent on the workload you are running. Maybe it's even platform specific and may not trigger on your testing platform. And with platform I don't mean just the OS. Also the processor type (Single core, Multi core) can make a huge difference. What can really help you is to expect the situation in production. And instead of panicing and quickly restarting, take your time, having the right tools at hand to find out what's going on in this moment. Maybe even a "post mortem" job that gathers useful information in case this happens during everyone is sleeping. Is it swapping? Has the VM run out of memory (stack, heap, perm gen, code) and is constantly GCing? Has the OS run out of file descriptors? To which signals does it react? Is it creating threads at a high rate? Are there just too many runnable threads? Is it busy waiting? Is it looping endlessly? Is it I/O bound? Is it lock contented or even deadlocked? Is it blocking on I/O or network? Is it waiting for the DB? What's going on on the DB? What's going on on the network? Is your log detailes enough to give you the information you need? If all fails you, you may have to live with it and rather setup monitoring infrastructure that can reliably detect the situation and restart the process. > > I really appreciate you taking the time to answer my emails, thank you very > much. > > Sincerely, > David Hamilton Ortwin --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org For additional commands, e-mail: dev-help@hc.apache.org