Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 30949 invoked from network); 30 Apr 2010 20:35:57 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Apr 2010 20:35:57 -0000 Received: (qmail 6143 invoked by uid 500); 30 Apr 2010 20:35:57 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 6050 invoked by uid 500); 30 Apr 2010 20:35:57 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 6038 invoked by uid 99); 30 Apr 2010 20:35:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Apr 2010 20:35:57 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.147.107.20] (HELO mrout1-b.corp.re1.yahoo.com) (69.147.107.20) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Apr 2010 20:35:49 +0000 Received: from SNV-EXPF01.ds.corp.yahoo.com (snv-expf01.ds.corp.yahoo.com [207.126.227.250]) by mrout1-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id o3UKY6dw089146; Fri, 30 Apr 2010 13:34:06 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:user-agent:date:subject:from:to:message-id: thread-topic:thread-index:in-reply-to:mime-version:content-type: content-transfer-encoding:return-path:x-originalarrivaltime; b=k4E7402W07WnVmppKU0R5et4pnaisHLET67trxW635ABxTjZMOXt9GOySsWrx8mn Received: from SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.86]) by SNV-EXPF01.ds.corp.yahoo.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 30 Apr 2010 13:34:06 -0700 Received: from 10.73.146.106 ([10.73.146.106]) by SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.84]) via Exchange Front-End Server snv-webmail.corp.yahoo.com ([207.126.227.59]) with Microsoft Exchange Server HTTP-DAV ; Fri, 30 Apr 2010 20:33:56 +0000 User-Agent: Microsoft-Entourage/12.24.0.100205 Date: Fri, 30 Apr 2010 13:33:54 -0700 Subject: Re: Question on maintaining leader/membership status in zookeeper From: Mahadev Konar To: , Henry Robinson Message-ID: Thread-Topic: Question on maintaining leader/membership status in zookeeper Thread-Index: AcrolsyrVHaljpCxZU+a1Yj/KCnSLwAPrJWA//+QRICAAHxxAP//jxmA///95bw= In-Reply-To: Mime-version: 1.0 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable X-OriginalArrivalTime: 30 Apr 2010 20:34:06.0049 (UTC) FILETIME=[7A78AD10:01CAE8A4] X-Virus-Checked: Checked by ClamAV on apache.org Hi Lei, In this case, its up to application to decide what to do when this happens= . The application will be notified that its disconnected from the ZooKeeper cluster. In such a case some of the applications might decide to not procee= d at all, (since it might lead to some state corruption) and some others migh= t decide on using cached values, wherein stale values are fine for correctnes= s of the system. Its up to you to decide what you would want to do in such a situation. Also, usually you would want to set up ZooKeeper clusters in such a way tha= t this should not be possible... Like across switches.... In this case, the application will be able to access one of the zookeeper servers on the zookeeper cluster and it will be highly unlikely that they arent able to reach any one of those. Hope this helps. Thanks mahadev On 4/30/10 1:26 PM, "Lei Gao" wrote: > Hi Henry, >=20 > I am not talking about the leader election within zookeeper cluster. I gu= ess > I didn't make the discussion context clear. In my case, I run a cluster t= hat > uses zookeeper for doing the leader election. Yes, nodes in my cluster ar= e > the clients of zookeeper. Those nodes depend on zookeeper to elect a new > leader and figure out what the current leader is. So if the zookeeper (th= ink > of it as a stand-alone entity) becomes unavailabe in the way I've describ= ed > earlier, how can I handle such situation so my cluster can still function > while a majority of nodes still connect to each other (but not to the > zookeeper)? >=20 > Thanks, >=20 > Lei >=20 >=20 > On 4/30/10 1:10 PM, "Henry Robinson" wrote: >=20 >> Hi Lei - >>=20 >> The 'user cluster' (by which I think you mean the set of clients of >> ZooKeeper?) plays no part in leader election. If a majority of ZooKeeper >> server nodes can talk to each other, a new leader can be elected. Client= s of >> the minority server partition will be disconnected - if they too cannot >> reach the majority partition then they will not be able to reconnect. >>=20 >> Hope this helps, >> Henry >>=20 >> On 30 April 2010 12:45, Lei Gao wrote: >>=20 >>> Hi Ted, >>>=20 >>> I 100% agree with what you said. But my question is more about what if = my >>> zookeeper service cluster is partitioned from a majority of nodes in my= USER >>> CLUSTER. In this case, the majority nodes in one network partition can= =B9t >>> select a new leader because zookeeper is out of reach. >>>=20 >>> Another example will be that if there is an asymmetric network failure >>> where a majority of nodes from the USER CLUSTER can=B9t reach the leader = while >>> the zookeeper still can. How does zookeeper handle such situation? >>>=20 >>> Thanks, >>>=20 >>> Lei >>>=20 >>> On 4/30/10 12:24 PM, "Ted Dunning" wrote: >>>=20 >>> There are a variety of situations that can trigger a new leader electio= n >>> and a few that can cause the cluster to be unable to elect a new leader= . >>> Isolation of just the leader is one of the situations that will cause = a new >>> leader election. Isolation of nodes into groups smaller than the quoru= m >>> will result in the cluster freezing. >>>=20 >>> On Fri, Apr 30, 2010 at 11:56 AM, Lei Gao wrote: >>> Hi, >>>=20 >>> I have a general question on how zookeeper can maintain its view of the >>> user cluster (that zookeeper manages) that is consistent with the nodes= in >>> the user cluster. In other words, when zookeeper considers the current >>> leader is unavailable, does it really guarantee that a majority of node= s in >>> the user cluster can=B9t reach the current leader? The same question appl= ies >>> to the membership service as well. Because the zookeeper can be partiti= oned >>> from a majority of the nodes in the user cluster. How does the zookeepe= r >>> handle situations like this? >>>=20 >>> Thanks, >>>=20 >>> Lei >>>=20 >>>=20 >>>=20 >>=20 >=20