Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 73019 invoked from network); 30 Apr 2010 23:56:59 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Apr 2010 23:56:59 -0000 Received: (qmail 85962 invoked by uid 500); 30 Apr 2010 23:56:59 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 85929 invoked by uid 500); 30 Apr 2010 23:56:59 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 85921 invoked by uid 99); 30 Apr 2010 23:56:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Apr 2010 23:56:58 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=AWL,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.yahoo.com) (69.147.107.21) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Apr 2010 23:56:53 +0000 Received: from SNV-EXPF01.ds.corp.yahoo.com (snv-expf01.ds.corp.yahoo.com [207.126.227.250]) by mrout2-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id o3UNu37k006308; Fri, 30 Apr 2010 16:56:04 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:user-agent:date:subject:from:to:message-id: thread-topic:thread-index:in-reply-to:mime-version:content-type: content-transfer-encoding:x-originalarrivaltime; b=ki94gfIZci11a3qvda5nGtrTpDxLXdqzp2n43F1RC2BCniSNvDehyZqnkZazfQ+G Received: from SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.86]) by SNV-EXPF01.ds.corp.yahoo.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 30 Apr 2010 16:56:03 -0700 Received: from 10.72.168.172 ([10.72.168.172]) by SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.84]) via Exchange Front-End Server snv-webmail.corp.yahoo.com ([207.126.227.59]) with Microsoft Exchange Server HTTP-DAV ; Fri, 30 Apr 2010 23:55:27 +0000 User-Agent: Microsoft-Entourage/12.24.0.100205 Date: Fri, 30 Apr 2010 16:55:26 -0700 Subject: Re: Question on maintaining leader/membership status in zookeeper From: Mahadev Konar To: Lei Gao , "zookeeper-user@hadoop.apache.org" Message-ID: Thread-Topic: Question on maintaining leader/membership status in zookeeper Thread-Index: AcrolsyrVHaljpCxZU+a1Yj/KCnSLwAPrJWA//+QRICAAHxxAP//jxmAgAB9QYD//46HgP//5E0x///DvMP//4Tvyf//AO2v//3/QyA= In-Reply-To: Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-OriginalArrivalTime: 30 Apr 2010 23:56:03.0409 (UTC) FILETIME=[B0FB1410:01CAE8C0] HI Lei, ZooKeeper provides a set of primitives which allows you to do all kinds of things! You might want to take a look at the api and some examples of zookeeper recipes to see how it works and probably that will clear things out for you. Here are the links: http://hadoop.apache.org/zookeeper/docs/r3.3.0/recipes.html Thanks mahadev On 4/30/10 4:46 PM, "Lei Gao" wrote: > Hi Mahadev, > > First of all, I like to thank you for being patient with me - my questions > seem unclear to many of you who try to help me. > > I guess clients have to be smart enough to trigger a new leader election by > trying to delete the znode. But in this case, ZK should not allow any single > or multiple (as long as they are less than a quorum) client(s) to delete the > znode responding to the master, right? A new consensus among clients (NOT > among the nodes in zk cluster) has to be there for the znode to be deleted, > right? Does zk have this capability or the clients have to come to this > consensus outside of zk before trying to delete the znode in zk? > > Thanks, > > Lei > >> Hi Lei, >> Sorry I minsinterpreted your question! The scenario you describe could be >> handled in such a way - >> >> You could have a status node in ZooKeeper which every slave will subscribe >> to and update! If one of the slave nodes sees that there have been too many >> connection refused to the Leader by the slaves, the slave could go ahead and >> delete the Leader znode, and force the Leader to give up its leadership. I >> am not describing a deatiled way to do it, but its not very hard to come up >> with a design for this. >> >> >> >> Do you intend to have the Leader and Slaves in different Network (different >> ACLs I mean) protected zones? In that case, it is a legitimate concern else >> I do think assymetric network partition would be very unlikely to happen. >> >> Do you usually see network partitions in such scenarios? >> >> Thanks >> mahadev >> >> >> On 4/30/10 4:05 PM, "Lei Gao" wrote: >> >>> Hi Mahadev, >>> >>> Why would the leader be disconnected from ZK? ZK is fine communicating with >>> the leader in this case. We are talking about asymmetric network failure. >>> Yes. Leader could consider all the slaves being down if it tracks the status >>> of all slaves himself. But I guess if ZK is used for for membership >>> management, neither the leader nor the slaves will be considered >>> disconnected because they can all connect to ZK. >>> >>> Thanks, >>> >>> Lei >>> >>> >>> On 4/30/10 3:47 PM, "Mahadev Konar" wrote: >>> >>>> Hi Lei, >>>> >>>> In this case, the Leader will be disconnected from ZK cluster and will give >>>> up its leadership. Since its disconnected, ZK cluster will realize that the >>>> Leader is dead!.... >>>> >>>> When Zk cluster realizes that the Leader is dead (this is because the zk >>>> cluster hasn't heard from the Leader for a certain time.... Configurable >>>> via >>>> session timeout parameter), the slaves will be notified of this via >>>> watchers >>>> in zookeeper cluster. The slaves will realize that the Leader is gone and >>>> will relect a new Leader and will start working with the new Leader. >>>> >>>> Does that answer your question? >>>> >>>> You might want to look though the documentation of ZK to understand its use >>>> case and how it solves these kind of issues.... >>>> >>>> Thanks >>>> mahadev >>>> >>>> >>>> On 4/30/10 2:08 PM, "Lei Gao" wrote: >>>> >>>>> Thank you all for your answers. It clarifies a lot of my confusion about >>>>> the >>>>> service guarantees of ZK. I am still struggling with one failure case (I >>>>> am >>>>> not trying to be the pain in the neck. But I need to have a full >>>>> understanding of what ZK can offer before I make a decision on whether to >>>>> used it in my cluster.) >>>>> >>>>> Assume the following topology: >>>>> >>>>> Leader ==== ZK cluster >>>>> \\ // >>>>> \\ // >>>>> \\ // >>>>> Slave(s) >>>>> >>>>> If I am asymmetric network failure such that the connection between Leader >>>>> and Slave(s) are broken while all other connections are still alive, would >>>>> my system hang after some point? Because no new leader election will be >>>>> initiated by slaves and the leader can't get the work to slave(s). >>>>> >>>>> Thanks, >>>>> >>>>> Lei >>>>> >>>>> On 4/30/10 1:54 PM, "Ted Dunning" wrote: >>>>> >>>>>> If one of your user clients can no longer reach one member of the ZK >>>>>> cluster, then it will try to reach another. If it succeeds, then it will >>>>>> continue without any problems as long as the ZK cluster itself is OK. >>>>>> >>>>>> This applies for all the ZK recipes. You will have to be a little bit >>>>>> careful to handle connection loss, but that should get easier soon (and >>>>>> isn't all that difficult anyway). >>>>>> >>>>>> On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao wrote: >>>>>> >>>>>>> I am not talking about the leader election within zookeeper cluster. I >>>>>>> guess >>>>>>> I didn't make the discussion context clear. In my case, I run a cluster >>>>>>> that >>>>>>> uses zookeeper for doing the leader election. Yes, nodes in my cluster >>>>>>> are >>>>>>> the clients of zookeeper. Those nodes depend on zookeeper to elect a >>>>>>> new >>>>>>> leader and figure out what the current leader is. So if the zookeeper >>>>>>> (think >>>>>>> of it as a stand-alone entity) becomes unavailabe in the way I've >>>>>>> described >>>>>>> earlier, how can I handle such situation so my cluster can still >>>>>>> function >>>>>>> while a majority of nodes still connect to each other (but not to the >>>>>>> zookeeper)? >>>>>>> >>>>> >>>> >>> >> >