Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7987D18BDD for ; Thu, 25 Jun 2015 21:36:54 +0000 (UTC) Received: (qmail 81309 invoked by uid 500); 25 Jun 2015 21:36:53 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 81261 invoked by uid 500); 25 Jun 2015 21:36:53 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 81249 invoked by uid 99); 25 Jun 2015 21:36:53 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2015 21:36:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 44946D046C for ; Thu, 25 Jun 2015 21:36:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.88 X-Spam-Level: ** X-Spam-Status: No, score=2.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id jXwyFxbeIM75 for ; Thu, 25 Jun 2015 21:36:45 +0000 (UTC) Received: from mail-ig0-f177.google.com (mail-ig0-f177.google.com [209.85.213.177]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 04C9143DA9 for ; Thu, 25 Jun 2015 21:36:45 +0000 (UTC) Received: by igcsj18 with SMTP id sj18so18710601igc.1 for ; Thu, 25 Jun 2015 14:36:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=aeBXywe6IgoF/d7NyBB083vn7UZwBvIQEFCnDRqVgOo=; b=m12+tv3eWPA9afWfOCGB54giUeb8vxnU5BhApmXsrUufulo7A+V5nTsRCQJebVozAw xZjuOy8VYikE6/JNBSg7aSGC7gfcBdb6mfiyYHFZPbwdEIWWmH62psaAHYdF74qnoOcQ 1G1rnKF17Ajpog6QEu4KxjLypMV2LcgEc0H60VS03B+391oOTcgVJwKcJp307jRqjENq BzchM/0fHDqxu4j+gzTCFD3SSX5AJcUdfunVZV8DyOs517ydut7Hp4mg+4LGeM6wT4cd 9tL7GP2cvANUylnySw88Ayu92MFjFOUCzMokdsbylesmSpqs32BQkJvSGVmNceqXO4mi Rsvg== X-Received: by 10.107.46.2 with SMTP id i2mr62334564ioo.18.1435268204617; Thu, 25 Jun 2015 14:36:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.39.82 with HTTP; Thu, 25 Jun 2015 14:36:25 -0700 (PDT) In-Reply-To: References: From: Alexander Shraer Date: Thu, 25 Jun 2015 14:36:25 -0700 Message-ID: Subject: Re: Incrementally bootstrapping a 3.5.0-alpha cluster? To: "user@zookeeper.apache.org" Content-Type: multipart/alternative; boundary=001a114406847e686605195e6bbc --001a114406847e686605195e6bbc Content-Type: text/plain; charset=UTF-8 This message itself doesn't indicate a failure, its quite normal. But if you have a situation where the ensemble gets stuck or doesn't elect a leader, please open a jira and post your server logs. Thanks, Alex On Thu, Jun 25, 2015 at 9:43 AM, Benjamin Anderson wrote: > Hi Alexander, I've had much better luck with the codebase @91ecdac, > but I've still observed the "Have smaller server identifier" type > failure at least once. It's reliable enough for me to work around the > remaining failures, at least. > > Thanks! > -- > b > > On Wed, Jun 24, 2015 at 8:20 AM, Alexander Shraer > wrote: > > Hi Benjamin, I'm curious if this worked > > > > thanks, > > Alex > > > > On Sat, Jun 20, 2015 at 7:40 PM, Alexander Shraer > wrote: > > > >> There were bug fixes since the 2014 release. So if it doesn't work > perhaps > >> you could try with trunk: > >> > >> svn checkout http://svn.apache.org/repos/asf/zookeeper/trunk dir> > >> > >> On Sat, Jun 20, 2015 at 7:35 PM, Alexander Shraer > >> wrote: > >> > >>> Hi, > >>> > >>> Approach 1 isn't supposed to work, since each server forms its own > >>> ensemble. Each server is the leader in its own ensemble > >>> so when you try to reconfigure it expects the other server to connect > as > >>> a follower but that doesn't happen. The error just means that you can't > >>> reconfigure since you will loose a quorum (in an ensemble of 2 servers > you > >>> must have both ack every request and here you won't have that since > they > >>> are not talking). > >>> > >>> Approach 2 is supposed to work, no matter if the first server is 2 or > 1. > >>> There may be a bug of course, but I just locally tried the scenario > that > >>> fails for you (as I understood it) and it worked. Here is my setup, > perhaps > >>> your can send me yours if it still doesn't work. > >>> > >>> server 1: > >>> dataDir=/home/shralex/zk-sat/zookeeper1 > >>> standaloneEnabled=false > >>> syncLimit=2 > >>> initLimit=5 > >>> tickTime=2000 > >>> server.1=localhost:2721:2731:participant;localhost:2791 > >>> server.2=localhost:2722:2732:participant;localhost:2792 > >>> > >>> server 2: > >>> dataDir=/home/shralex/zk-sat/zookeeper2 > >>> standaloneEnabled=false > >>> syncLimit=2 > >>> initLimit=5 > >>> tickTime=2000 > >>> server.2=localhost:2722:2732:participant;localhost:2792 > >>> > >>> starting server 2 first. it says its the leader. starting server 1. > then > >>> connecting to server 2 with a client and issuing a reconfig adding > server 1 > >>> > >>> Alex > >>> > >>> > >>> > >>> On Fri, Jun 19, 2015 at 6:27 PM, Benjamin Anderson > >>> wrote: > >>> > >>>> Hi there - I'm working on automating bootstrapping of a 3-node ZK > >>>> 3.5.0-alpha ensemble and I'm running in to some problems with getting > >>>> the nodes to join up. The dynamic configuration page[1] suggests that, > >>>> > >>>> "...it is possible to start a ZooKeeper ensemble containing a single > >>>> participant and to dynamically grow it by adding more servers" > >>>> > >>>> which is what I'm attempting to do. I've found, however, that this can > >>>> be rather problematic. What is the "correct" procedure for dynamically > >>>> growing an ensemble from a single participant? > >>>> > >>>> I've tried two approaches: > >>>> > >>>> Approach A: > >>>> > >>>> 1. Start two nodes, one with myid=1 and one with myid=2. Each node's > >>>> dynamicConfigFile contains a single line referring to itself, i.e., > >>>> neither node is aware of the other. > >>>> > >>>> 2. Open a zkCli to either of the two nodes and issue a `reconfig` > >>>> command to add the other, unknown node. > >>>> > >>>> This method fails with "KeeperErrorCode = NewConfigNoQuorum for". > >>>> > >>>> Approach B: > >>>> > >>>> 1. Start one node with myid=1 and a dynamicConfigFile that only refers > >>>> to itself, then start a second node with myid=2 and a > >>>> dynamicConfigFile that refers to itself *and* the node with myid=1. > >>>> > >>>> 2. Open a zkCli to the node with myid=1 and issue a reconfig command > >>>> to add the node with myid=2. > >>>> > >>>> This approach works! However, if the ordering is reversed (i.e., the > >>>> myid=2 node boots first and refers only to itself, and the myid=1 node > >>>> refers to both itself and the myid=2 node,) then the myid=1 node will > >>>> *never* come up cleanly - it hangs forever logging messages such as > >>>> the one in this gist[2]. In my environment the boot ordering is not > >>>> guaranteed, so this is rather challenging for me. > >>>> > >>>> My baseline config is roughly this[3]. > >>>> > >>>> Is there a well-known and reliable way to incrementally join nodes to > >>>> a ZK ensemble in 3.5.0-alpha? Do I need to be using a newer version > >>>> than the release cut back in August 2014? > >>>> > >>>> Thanks! > >>>> -- > >>>> b > >>>> > >>>> [1]: http://zookeeper.apache.org/doc/trunk/zookeeperReconfig.html > >>>> [2]: https://gist.github.com/banjiewen/936f5620d33a8eb0ddf4 > >>>> [3]: https://gist.github.com/banjiewen/c7f11c749933ac1bab72 > >>>> > >>> > >>> > >> > --001a114406847e686605195e6bbc--