Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 749111746E for ; Tue, 28 Apr 2015 18:54:49 +0000 (UTC) Received: (qmail 80564 invoked by uid 500); 28 Apr 2015 18:54:48 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 80515 invoked by uid 500); 28 Apr 2015 18:54:48 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 80503 invoked by uid 99); 28 Apr 2015 18:54:48 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Apr 2015 18:54:48 +0000 Received: from mail-oi0-f49.google.com (mail-oi0-f49.google.com [209.85.218.49]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 5DECF1A0437 for ; Tue, 28 Apr 2015 18:54:48 +0000 (UTC) Received: by oiko83 with SMTP id o83so3718599oik.1 for ; Tue, 28 Apr 2015 11:54:47 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.182.211.66 with SMTP id na2mr15726302obc.43.1430247287220; Tue, 28 Apr 2015 11:54:47 -0700 (PDT) Received: by 10.202.183.198 with HTTP; Tue, 28 Apr 2015 11:54:47 -0700 (PDT) In-Reply-To: References: Date: Tue, 28 Apr 2015 14:54:47 -0400 Message-ID: Subject: Re: Leader election duration From: Camille Fournier To: "user@zookeeper.apache.org" Content-Type: multipart/alternative; boundary=e89a8f6474157ef0f60514cd65b9 --e89a8f6474157ef0f60514cd65b9 Content-Type: text/plain; charset=UTF-8 Just out of curiosity, if you start the 5 node cluster up with only 3 of the nodes to begin with (like, config 5, but only bring up 3 processes), does it speed up the leader election or is it still slow? C On Tue, Apr 28, 2015 at 1:41 PM, Karol Dudzinski wrote: > Hi, > > We're seeing some rather strange leader election in one of our clusters. > The duration reported by the "FOLLOWING - LEADER ELECTION TOOK" log line > (and equivalent for the leader) seems to vary hugely. During one rolling > reboot, I saw the number reported as small as 39ms and as large as 57 > seconds (difference in units is not a typo). The average is just about 10 > seconds and std dev also about 10 seconds. So the time taken is not only > quite large, it's also very variable. > > We have other clusters but the average election time in those is in the > hundreds of millis with std dev in a similar ballpark. I guess one > difference is the "slow" cluster is 5 participants while the others are 3, > which may be a factor but I wouldn't expect it to make two orders of > magnitude difference! > > So my question is, what factors contribute to the election time reported > by these log lines? And what can we do to speed this up? > > As far as I understand from logs and a quick browse through the code that > time is the time to select a leader. Syncing up to the leader happens > after that. The syncing part I can understand will vary depending on load > but I don't see why selecting the leader would. > > Thanks, > Karol --e89a8f6474157ef0f60514cd65b9--