Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 41118 invoked from network); 13 Apr 2010 00:12:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Apr 2010 00:12:34 -0000 Received: (qmail 27490 invoked by uid 500); 13 Apr 2010 00:12:34 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 27474 invoked by uid 500); 13 Apr 2010 00:12:34 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 27466 invoked by uid 99); 13 Apr 2010 00:12:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Apr 2010 00:12:34 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [132.239.0.176] (HELO iport-c1-out.ucsd.edu) (132.239.0.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Apr 2010 00:12:24 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AsYEAOxTw0uE7zNQ/2dsb2JhbACPaIxLs1yIXYUMBIMl X-IronPort-AV: E=Sophos;i="4.52,193,1270450800"; d="scan'208";a="222846117" X-Spam-Level: Received: from csesmtp2.ucsd.edu (HELO cse-smtp.ucsd.edu) ([132.239.51.80]) by iport-c1-out.ucsd.edu with ESMTP/TLS/ADH-AES256-SHA; 12 Apr 2010 17:12:03 -0700 Received: from kwebb.ucsd.edu (kwebb.ucsd.edu [137.110.222.154]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by cse-smtp.ucsd.edu (Postfix) with ESMTP id EF8225005C for ; Mon, 12 Apr 2010 17:12:02 -0700 (PDT) Date: Mon, 12 Apr 2010 17:12:02 -0700 From: Kevin Webb To: zookeeper-user@hadoop.apache.org Subject: Re: znode cversion decreasing? Message-ID: <20100412171202.477dd60e@kwebb.ucsd.edu> In-Reply-To: <4BC3B148.5010802@apache.org> References: <20100412142631.508ae469@kwebb.ucsd.edu> <4BC3B148.5010802@apache.org> Organization: UC San Diego X-Mailer: Claws Mail 3.5.0 (GTK+ 2.12.12; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No On Mon, 12 Apr 2010 16:48:24 -0700 Patrick Hunt wrote: > Any idea why the connection is flapping so badly? Is this > client->server connection remote or in colo? (not that than should > effect the operations of the server...) >=20 > Patrick The clients are all over the world. I have three servers, one in the US, one in Germany, and one in South Korea. Clients are connecting from North/South America to the US server, from Europe to the German serer, and from Asia/Australia to the Korean server. This is all happening on PlanetLab, which is sometimes heavily oversubscribed. In short, any number of bad things could be happening that cause us to lose connectivity. =46rom your previous message: > What's the ping time btw colos? 2sec tickTime and esp the initLimit > and syncLimit are pretty low. You are allowing for only 4 seconds to > d/l the data repository to a remote server. Even in-colo we typically > use a higher value... but you many not want to change until we can > reproduce this. You probably want a 4 sec tickTime and 60/40sec (so > settings of 15/10) for the init/sync limits (something like that, > depending on latencies/bandwidth you see) Interesting, I thought I was using the default config parameters with only a modified data directory and my own hostnames, but I see now that that defaults are larger. Those values should certainly be larger for the environment I'm running in. I'll leave them as they are for now to see if we can reproduce the problem, though I'll eventually need to fix them as my deadline approaches. :) > Probably reaching for straws but could you print "path", just to > confirm it's what you know it is? Sure, I can do this. I only have a single top-level znode though, so I don't think this is the problem, but it can't hurt to double check. -Kevin