hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ortwin Gl├╝ck <...@odi.ch>
Subject Re: Question on HTTPClient-675
Date Mon, 15 Jun 2009 22:41:38 GMT


D H wrote:
> 
> I agree about this, they wanted proof of a fix before any changes would be
> made and my manager is still saying there must be proof before code changes
> even now after I showed him your email and the documentation from the site.

Nice way of thinking -- at a university where time and money are infinite 
resources. Wishful thinking anywhere else. Enter The Real World.

>> Anyway if this is so hard to reproduce and profiling didn't give you any
>> idea what's broken, why do you suspect something in this particular piece of
>> code is the cause of your problem?
> 
> 
> That's a very good question, I was given the source code, no explanation of
> their code and told HttpClient was the problem.  

Why am I not convinced? But it's well possible. My best advice I can give is: 
Upgrade for God's sake, and fix obvious mistakes in the use of the API. Use some 
monitoring tools like JConsole / JVisualVM, jmap, netstat, top and the like to 
see if you have a garbage collection problem or any obvious resource leaks, then 
take appropriate action.

 > It is a rather large
> codebase so I'm taking their word for it.  They've apparently worked on it
> sporadically for a couple of months to isolate it to HttpClient, and my task
> is to prove it is causing this problem with JMeter in a self-contained code
> sample.  This problem has only been seen in Production and only after almost
> a week of running 24/7 so it's hard to duplicate it easily.  I've sent over
> two hundred thousand HttpClient requests without seeing the problem so I'd
> rather see this code fix go to Production and test that way personally.

Sounds like a a typical production problem to me: it can take weeks to see it, 
there is no way you can trigger it on purpose and maybe it has only ever been 
encountered on the production system.

Face it, you will not reproduce it locally in reasonable time. It is maybe 
dependent on the workload you are running. Maybe it's even platform specific and 
may not trigger on your testing platform. And with platform I don't mean just 
the OS. Also the processor type (Single core, Multi core) can make a huge 
difference.

What can really help you is to expect the situation in production. And instead 
of panicing and quickly restarting, take your time, having the right tools at 
hand to find out what's going on in this moment. Maybe even a "post mortem" job 
that gathers useful information in case this happens during everyone is 
sleeping. Is it swapping? Has the VM run out of memory (stack, heap, perm gen, 
code) and is constantly GCing? Has the OS run out of file descriptors? To which 
signals does it react? Is it creating threads at a high rate? Are there just too 
many runnable threads? Is it busy waiting? Is it looping endlessly? Is it I/O 
bound? Is it lock contented or even deadlocked? Is it blocking on I/O or 
network? Is it waiting for the DB? What's going on on the DB? What's going on on 
the network? Is your log detailes enough to give you the information you need?

If all fails you, you may have to live with it and rather setup monitoring 
infrastructure that can reliably detect the situation and restart the process.

> 
> I really appreciate you taking the time to answer my emails, thank you very
> much.
> 
> Sincerely,
> David Hamilton

Ortwin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Mime
View raw message