zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Understanding Load on Zookeeper Box
Date Fri, 25 May 2012 00:31:26 GMT
On Thu, May 24, 2012 at 3:42 PM, Matthew Ward <matt@pixelpipe.com> wrote:
> I have a couple theories and questions I was hoping to clear up (all java based 3.3.4):
> 1) I have been trying to troubleshoot the reason for high system wait time on one of
our zookeeper instances. The theory I have is that setting watches increases the system wait
load. Does this theory sound accurate?

The two most common causes of high latency are GC/swapping and high
disk utilization on the transaction log (WAL). Check for that first.

Have you seen this page?
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting

Given you mention AWS in q2 that might also be related - remember
you're not accessing the disk(s) directly so disk issues are even more
likely - the main issue being that we need to fsync the txnlog before
responding to the proposal. (I often use strace on the fsync fdatasync
methods to track/graph this)

> 2) Question 2 is a follow up to the first... whenever I do a watch and wait for the event,
I have an 'insurance policy' (since AWS is fun...) of setting a mutex with a timeout, before
retrying the operation and potentially setting another watch. How does zookeeper handle duplicate
watches? Am I exacerbating the system wait load issue by setting duplicate watches? If there
a way I should cancel the watch?

A particular session can establish only a single watch on a particular
path. Multiple watches have no negative effect (other than a
round-trip read to the server of course).

Patrick

Mime
View raw message