Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0D3DF1830D for ; Thu, 25 Jun 2015 06:40:11 +0000 (UTC) Received: (qmail 99722 invoked by uid 500); 25 Jun 2015 06:40:10 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 99676 invoked by uid 500); 25 Jun 2015 06:40:10 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 99661 invoked by uid 99); 25 Jun 2015 06:40:09 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2015 06:40:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 72C73C095D for ; Thu, 25 Jun 2015 06:40:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.001 X-Spam-Level: X-Spam-Status: No, score=0.001 tagged_above=-999 required=6.31 tests=[URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id HBJPI5VDYrkZ for ; Thu, 25 Jun 2015 06:39:54 +0000 (UTC) Received: from mail-wg0-f52.google.com (mail-wg0-f52.google.com [74.125.82.52]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 75A2820777 for ; Thu, 25 Jun 2015 06:39:54 +0000 (UTC) Received: by wguu7 with SMTP id u7so53719182wgu.3 for ; Wed, 24 Jun 2015 23:39:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to; bh=+a8XpCh5ounEz7U3C6E7mnajN2n0rwUGTzVwkYSlj+Y=; b=nHsZ9PuNz06/1LcxJrcjTjhJ173NcP1TG4IedjxvCDo8WmLeR/4F8f1QyhZ7MZfLwL HRAwsbVmhCg+ZFBwPdufxcsHnD9bnllNwJ7gvPGDFGnbLZfdmWDP/5YSbPfZaNoPeCbd ZinHkiXmp4NZFbnhvySe35mdFcRVfM5xGJkEGsjbXj5hXMAG8dgmihU460BY9aKh4A/l N7Gwt4bfawXyq5LqKbQPVB1cvRt2t6A9v2cUbFHD5S10Q4Yf27/DN2iOWaCCL0ZSasa/ PLboFcH00GQynK4F6HajHwcBjjH84HAJJSyjYRAE5mLNZjTgwhs5phnr1Y//jFKUt+bo ZQmQ== X-Gm-Message-State: ALoCoQkKDaQ7CoaBbsc2TKhEgXpPxSUR6/J9/n+MaWw8mBelrGsLLS5ORpePkP73eprrmwprYgAy X-Received: by 10.194.121.163 with SMTP id ll3mr65343129wjb.142.1435214394144; Wed, 24 Jun 2015 23:39:54 -0700 (PDT) Received: from [10.72.0.4] ([91.183.125.230]) by mx.google.com with ESMTPSA id a6sm11149871wjy.33.2015.06.24.23.39.53 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 24 Jun 2015 23:39:53 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: Leader election problems From: Filip Deleersnijder In-Reply-To: Date: Thu, 25 Jun 2015 08:39:56 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <24927B98-DABB-4D6D-8BCB-9F8D60896EAC@motum.be> References: <0ABDD99C-B3C1-4278-B6E8-6A997658B988@motum.be> To: user@zookeeper.apache.org X-Mailer: Apple Mail (2.2070.6) Hi, Thanks for your response. Our application consists of 8 automatic vehicles in a warehouse setting. = Those vehicles need some consensus decisions, and that is what we use = Zookeeper for. Because vehicles can come and go at random, we installed a ZK = participant on every vehicle. The ZK client is some other piece of = software that is also running on the vehicles. Therefor :=20 - We can not choose the number of ZK-participants because it = just depends on the number of vehicles. - The participants communicate over Wifi - The client is running on the same machine, so it communicates = over the local network We are running Zookeeper version 3.4.6 Our zoo.cfg can be found below this e-mail. Thanks in advance ! Filip # The number of milliseconds of each tick tickTime=3D2000 # The number of ticks that the initial=20 # synchronization phase can take initLimit=3D10 # The number of ticks that can pass between=20 # sending a request and getting an acknowledgement syncLimit=3D5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just=20 # example sakes. dataDir=3Dc:/motum/config/MASS/ZK # the port at which the clients will connect clientPort=3D2181 server.1=3D172.17.35.11:2888:3888 server.2=3D172.17.35.12:2888:3888 server.3=3D172.17.35.13:2888:3888 server.4=3D172.17.35.14:2888:3888 server.5=3D172.17.35.15:2888:3888 server.6=3D172.17.35.16:2888:3888 server.7=3D172.17.35.17:2888:3888 server.8=3D172.17.35.18:2888:3888 # The number of snapshots to retain in dataDir # Purge task interval in hours # Set to "0" to disable auto purge feature autopurge.snapRetainCount=3D3 autopurge.purgeInterval=3D1 > On 24 Jun 2015, at 18:54, Ra=C3=BAl Guti=C3=A9rrez Segal=C3=A9s = wrote: >=20 > Hi, >=20 > On 24 June 2015 at 06:05, Filip Deleersnijder wrote: >=20 >> Hi, >>=20 >> Let=E2=80=99s start with some description of our system : >>=20 >> - We our using a Zookeeper cluster with 8 participants for an = application >> with mobile nodes ( connected over Wifi ). >>=20 >=20 > You mean the participants talk over wifi or the clients? >=20 >=20 >> ( Ip of the different nodes are according to the following structure = : >> Node X has IP : 172.17.35.1X ) >>=20 >=20 > Why 8 and not an odd number of machines (i.e.: > = http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServer= Setup > )? >=20 > - It is not that unusual to have a node being shut-down or restarted >> - We haven=E2=80=99t benchmarked the number of write operations yet, = but I would >> estimate that it would be less than 10 writes / second >>=20 >=20 > What version of ZK are you using? >=20 >=20 >>=20 >> The problem we are having however is that sometimes(*), some = instances >> seem to be having problems with leader election. >> Under the header =E2=80=9CAttachment 1=E2=80=9D below, you can find = the leader election >> times that were needed over 24h ( from 1 node ). One average it took = more >> than 1 minute ! >> I assume that this is not normal behaviour ? ( If somebody could = confirm >> that in a 8-node cluster, these are not normal leader election times, = that >> would be nice ) >>=20 >> In attachement 2 : I included an extract from the logging during a = leader >> election that took 101874ms for 1 node ( server 2 ). >>=20 >> Any help is greatly appreciated. >> If further or more specific logging is required, please ask ! >>=20 >>=20 > Do you mind sharing a copy of your config file (zoo.cfg)? Thanks! >=20 >=20 > -rgs