Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: zookeeper-user@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns;
	h=received:user-agent:date:subject:from:to:message-id:
	thread-topic:thread-index:in-reply-to:mime-version:content-type:
	content-transfer-encoding:x-originalarrivaltime;
	b=ki94gfIZci11a3qvda5nGtrTpDxLXdqzp2n43F1RC2BCniSNvDehyZqnkZazfQ+G
User-Agent: Microsoft-Entourage/12.24.0.100205
Date: Fri, 30 Apr 2010 16:55:26 -0700
Subject: Re: Question on maintaining leader/membership status in zookeeper
From: Mahadev Konar <mahadev@yahoo-inc.com>
To: Lei Gao <lgao@linkedin.com>,
        "zookeeper-user@hadoop.apache.org" <zookeeper-user@hadoop.apache.org>
Message-ID: <C800BBFE.34F01%mahadev@yahoo-inc.com>
Thread-Topic: Question on maintaining leader/membership status in zookeeper
Thread-Index: 
 AcrolsyrVHaljpCxZU+a1Yj/KCnSLwAPrJWA//+QRICAAHxxAP//jxmAgAB9QYD//46HgP//5E0x///DvMP//4Tvyf//AO2v//3/QyA=
In-Reply-To: <C800B9D2.B04%lgao@linkedin.com>
Mime-version: 1.0
Content-type: text/plain;
	charset="US-ASCII"
Content-transfer-encoding: 7bit

HI Lei,
  ZooKeeper provides a set of primitives which allows you to do all kinds of
things! You might want to take a look at the api and some examples of
zookeeper recipes to see how it works and probably that will clear things
out for you.

Here are the links:

http://hadoop.apache.org/zookeeper/docs/r3.3.0/recipes.html

Thanks
mahadev


On 4/30/10 4:46 PM, "Lei Gao" <lgao@linkedin.com> wrote:

> Hi Mahadev,
> 
> First of all, I like to thank you for being patient with me - my questions
> seem unclear to many of you who try to help me.
> 
> I guess clients have to be smart enough to trigger a new leader election by
> trying to delete the znode. But in this case, ZK should not allow any single
> or multiple (as long as they are less than a quorum) client(s) to delete the
> znode responding to the master, right? A new consensus among clients (NOT
> among the nodes in zk cluster) has to be there for the znode to be deleted,
> right?  Does zk have this capability or the clients have to come to this
> consensus outside of zk before trying to delete the znode in zk?
> 
> Thanks,
> 
> Lei
> 
>> Hi Lei,
>>  Sorry I minsinterpreted your question! The scenario you describe could be
>> handled in such a way -
>> 
>> You could have a status node in ZooKeeper which every slave will subscribe
>> to and update! If one of the slave nodes sees that there have been too many
>> connection refused to the Leader by the slaves, the slave could go ahead and
>> delete the Leader znode, and force the Leader to give up its leadership. I
>> am not describing a deatiled way to do it, but its not very hard to come up
>> with a design for this.
>> 
>> 
>> 
>> Do you intend to have the Leader and Slaves in different Network (different
>> ACLs I mean) protected zones? In that case, it is a legitimate concern else
>> I do think assymetric network partition would be very unlikely to happen.
>> 
>> Do you usually see network partitions in such scenarios?
>> 
>> Thanks
>> mahadev
>> 
>> 
>> On 4/30/10 4:05 PM, "Lei Gao" <lgao@linkedin.com> wrote:
>> 
>>> Hi Mahadev,
>>> 
>>> Why would the leader be disconnected from ZK? ZK is fine communicating with
>>> the leader in this case. We are talking about asymmetric network failure.
>>> Yes. Leader could consider all the slaves being down if it tracks the status
>>> of all slaves himself. But I guess if ZK is used for for membership
>>> management, neither the leader nor the slaves will be considered
>>> disconnected because they can all connect to ZK.
>>> 
>>> Thanks,
>>> 
>>> Lei  
>>> 
>>> 
>>> On 4/30/10 3:47 PM, "Mahadev Konar" <mahadev@yahoo-inc.com> wrote:
>>> 
>>>> Hi Lei,
>>>> 
>>>> In this case, the Leader will be disconnected from ZK cluster and will give
>>>> up its leadership. Since its disconnected, ZK cluster will realize that the
>>>> Leader is dead!....
>>>> 
>>>> When Zk cluster realizes that the Leader is dead (this is because the zk
>>>> cluster hasn't heard from the Leader for a certain time.... Configurable
>>>> via
>>>> session timeout parameter), the slaves will be notified of this via
>>>> watchers
>>>> in zookeeper cluster. The slaves will realize that the Leader is gone and
>>>> will relect a new Leader and will start working with the new Leader.
>>>> 
>>>> Does that answer your question?
>>>> 
>>>> You might want to look though the documentation of ZK to understand its use
>>>> case and how it solves these kind of issues....
>>>> 
>>>> Thanks
>>>> mahadev
>>>> 
>>>> 
>>>> On 4/30/10 2:08 PM, "Lei Gao" <lgao@linkedin.com> wrote:
>>>> 
>>>>> Thank you all for your answers. It clarifies a lot of my confusion about
>>>>> the
>>>>> service guarantees of ZK. I am still struggling with one failure case (I
>>>>> am
>>>>> not trying to be the pain in the neck. But I need to have a full
>>>>> understanding of what ZK can offer before I make a decision on whether to
>>>>> used it in my cluster.)
>>>>> 
>>>>> Assume the following topology:
>>>>> 
>>>>>          Leader  ==== ZK cluster
>>>>>               \\                    //
>>>>>                \\                  //
>>>>>                  \\               //
>>>>>                       Slave(s)
>>>>> 
>>>>> If I am asymmetric network failure such that the connection between Leader
>>>>> and Slave(s) are broken while all other connections are still alive, would
>>>>> my system hang after some point? Because no new leader election will be
>>>>> initiated by slaves and the leader can't get the work to slave(s).
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Lei
>>>>> 
>>>>> On 4/30/10 1:54 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>>>>> 
>>>>>> If one of your user clients can no longer reach one member of the ZK
>>>>>> cluster, then it will try to reach another.  If it succeeds, then it will
>>>>>> continue without any problems as long as the ZK cluster itself is OK.
>>>>>> 
>>>>>> This applies for all the ZK recipes.  You will have to be a little bit
>>>>>> careful to handle connection loss, but that should get easier soon (and
>>>>>> isn't all that difficult anyway).
>>>>>> 
>>>>>> On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao <lgao@linkedin.com> wrote:
>>>>>> 
>>>>>>> I am not talking about the leader election within zookeeper cluster. I
>>>>>>> guess
>>>>>>> I didn't make the discussion context clear. In my case, I run a cluster
>>>>>>> that
>>>>>>> uses zookeeper for doing the leader election. Yes, nodes in my cluster
>>>>>>> are
>>>>>>> the clients of zookeeper.  Those nodes depend on zookeeper to elect a
>>>>>>> new
>>>>>>> leader and figure out what the current leader is. So if the zookeeper
>>>>>>> (think
>>>>>>> of it as a stand-alone entity) becomes unavailabe in the way I've
>>>>>>> described
>>>>>>> earlier, how can I handle such situation so my cluster can still
>>>>>>> function
>>>>>>> while a majority of nodes still connect to each other (but not to the
>>>>>>> zookeeper)?
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>