Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C064C1EE for ; Sat, 22 Jun 2013 06:48:40 +0000 (UTC) Received: (qmail 78742 invoked by uid 500); 22 Jun 2013 06:48:39 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 78092 invoked by uid 500); 22 Jun 2013 06:48:29 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 78084 invoked by uid 99); 22 Jun 2013 06:48:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Jun 2013 06:48:27 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shralex@gmail.com designates 209.85.223.177 as permitted sender) Received: from [209.85.223.177] (HELO mail-ie0-f177.google.com) (209.85.223.177) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Jun 2013 06:48:20 +0000 Received: by mail-ie0-f177.google.com with SMTP id aq17so21063563iec.22 for ; Fri, 21 Jun 2013 23:47:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=NKRl+XFynrmoDGuyUbEpvEg1NASf6Ar6qwU6eTFeJW8=; b=i9W59k/tLplN0waUp/j8Vm0DOWI/+9N+xcfCYjmN0EFVr4WhfxBpl+JwaJvgDpec6L av28mUxlRDBunS7PeSSsXDwTIg+HTZP6NQdUcD1AkAJkV8PXhZDIOHnDyN8iLkulYt5F 9MjHHxe7SL66RkWN0BL30EQwtqMNmU6H7mgJ912H01kH7zrlJsM3QKMfixYwu+PcOqf2 rtuRjjZCCsa5YVwPFAdCmiLTXDjUImS684HYkJT5o5wx62Q+P5ZMMWzqMiodsT9QTS2q sg2/JPP6oZhiZbFtPAhICv9MArHLkp3AJhszUZHv7TbFFkAKlA60iylbPr3ZFMfWApEy hYnw== X-Received: by 10.50.154.106 with SMTP id vn10mr877812igb.0.1371883679329; Fri, 21 Jun 2013 23:47:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.9.243 with HTTP; Fri, 21 Jun 2013 23:47:39 -0700 (PDT) In-Reply-To: References: From: Alexander Shraer Date: Fri, 21 Jun 2013 23:47:39 -0700 Message-ID: Subject: Re: Zookeeper Configuration Sync To: user@zookeeper.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi Mohammad, +1 for the unique ensemble identifier request. We actually discussed this a long time ago but somehow never got to do this. Can you open a JIRA for this ? Suppose that server A has the latest log but only talks with server B during leader election (C is down or slow). A doesn't know whether or not the latest operations in its log are committed (in this case C would have them, but A doesn't know if this is the case). So to be safe everything in A's log gets committed in this case. We took the approach that a reconfiguration is treated similarly to normal data updates. When a server has the most up-to-date log and talks with a majority during leader election, it will be elected leader and commit its log to the other servers. It won't truncate its log even if its clear that some operations were not committed. This is true for normal updates as well as for reconfigurations. BTW, I'm not sure why you are shutting down servers or clearing the data during reconfigurations, or why you're manually changing config files. You can add servers to the ensemble by invoking the reconfig command and this will make all the necessary changes to the config files, including specifying the right config version. Alex On Fri, Jun 21, 2013 at 3:00 PM, Mohammad Shamma wrote: > I have a use case where I dynamically grow a zookeeper ensemble on the same > fixed set of machines multiple times. In each iteration, the ensemble is > grown incrementally till it consists of "n" servers. I will refer to the > machines hosting the servers as zk-1, zk-2, ..., zk-n. > > At the beginning of each iteration, I wipe out the zookeeper data > directories of zk-1 and zk-2, then statically configure the zookeeper > servers on both of them to form a 2-way ensemble. After that, I start > growing the ensemble incrementally by reconfiguring the zookeeper ensemble > to include zk-i, then clearing, configure and starting the zookeeper server > on zk-i (that is for i in range(2,n)). > > I was not shutting down or cleaning up the previous ensemble zookeeper > servers at the end of each iteration. After initializing the 2-way ensemble > on zk-1 and zk-2, I observed that the servers from the old deployment were > contacting the servers of the new ensemble and triggering an ensemble > reconfiguration. A quick look at the code seems to suggest that this is > simply triggered by the virtue that the config version value of the old > deployment server is higher than that of that found on the new ensemble > servers. Can anyone confirm my understanding of this behaviour of zookeeper? > > I also noticed that his reconfiguration holds true for n=3. For example > lets say zookeeper servers on zk-1 and zk-2 are freshly configured to form > a 2-way ensemble, and zk-3 contains a leftover server that was part of an > older 3-way ensemble (that included two obselete servers on zk-1 and zk-2). > To me it seems a bit counter intuitive for one server (on zk-3) to drive > the configuration of two other servers (zk1, zk2). The reason why it > seems counter intuitive is that the majority of the servers in the ensemble > agree on a different config version. Can somebody explain how zookeeper > handles this situation? > > One final note, it would be really useful if a zookeeper ensemble would > have a unique identifier that could be set in the "zoo.cfg" file. Whenever > servers communicate witch each other, they would verify that they are > talking to peers of the same ensemble before commencing with further > actions. Does that sound like a reasonable request? > > Thanks, > > -- > Mohammad Shamma