hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@yahoo-inc.com>
Subject Re: zookeeper on ec2
Date Thu, 03 Sep 2009 16:54:29 GMT
these suggestions would be great to put in a faq!

thanx ted


Ted Dunning wrote:
> I always used a large node for ZK to avoid sharing the machine, but the
> reason for doing that turned out to be incorrect.  In fact, my problem was
> to do with GC on the client side.
> I can't believe that they are seeing 50 second delays in EC2 due to I/O
> contention.  GC can do that, but only on a large heap.  Massive swapping of
> code pages can also cause this.
> My debug path here would be:
> a) verify the facts.  The key fact is that the ZK cluster is occasionally
> giving massive latency.  This must be verified to be the real problem and
> not an accidental incident.  It is possible that the problem is not where we
> think it is.
> b) check for the usual configuration suspects.  ZK should be alone on a
> machine.  DNS should be checked.  Connectivity should be checked between all
> hosts.
> c) look for swapping, look at GC logs.  Something has to give a clue as to
> how the latency is 1000x longer than usual.
> d) fix what came from (b) or (c) step.
> I am at a loss here other than this general advice.  I strongly suspect that
> something is being observed incorrectly or the machines are being massively
> abused.
> On Wed, Sep 2, 2009 at 12:37 PM, Patrick Hunt <phunt@apache.org> wrote:
>> I suspect that given a single disk is being used (not a dedicated disk for
>> the transaction log), and also given that this host is highly virtualized
>> (ec2), it seems to me that the most likely cause is IO. Specifically when
>> the zk cluster writes data to disk (due to client write) it must sync the
>> transaction log to disk. This sync behavior can impact the latency seen by
>> the clients. What type of ec2 node are you using? Ted, do you have any
>> insight on this? Any guidelines for the type of ec2 node to use for running
>> a zk cluster?

View raw message