zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@yahoo-inc.com>
Subject Re: Question on maintaining leader/membership status in zookeeper
Date Fri, 30 Apr 2010 23:55:26 GMT
HI Lei,
  ZooKeeper provides a set of primitives which allows you to do all kinds of
things! You might want to take a look at the api and some examples of
zookeeper recipes to see how it works and probably that will clear things
out for you.

Here are the links:



On 4/30/10 4:46 PM, "Lei Gao" <lgao@linkedin.com> wrote:

> Hi Mahadev,
> First of all, I like to thank you for being patient with me - my questions
> seem unclear to many of you who try to help me.
> I guess clients have to be smart enough to trigger a new leader election by
> trying to delete the znode. But in this case, ZK should not allow any single
> or multiple (as long as they are less than a quorum) client(s) to delete the
> znode responding to the master, right? A new consensus among clients (NOT
> among the nodes in zk cluster) has to be there for the znode to be deleted,
> right?  Does zk have this capability or the clients have to come to this
> consensus outside of zk before trying to delete the znode in zk?
> Thanks,
> Lei
>> Hi Lei,
>>  Sorry I minsinterpreted your question! The scenario you describe could be
>> handled in such a way -
>> You could have a status node in ZooKeeper which every slave will subscribe
>> to and update! If one of the slave nodes sees that there have been too many
>> connection refused to the Leader by the slaves, the slave could go ahead and
>> delete the Leader znode, and force the Leader to give up its leadership. I
>> am not describing a deatiled way to do it, but its not very hard to come up
>> with a design for this.
>> Do you intend to have the Leader and Slaves in different Network (different
>> ACLs I mean) protected zones? In that case, it is a legitimate concern else
>> I do think assymetric network partition would be very unlikely to happen.
>> Do you usually see network partitions in such scenarios?
>> Thanks
>> mahadev
>> On 4/30/10 4:05 PM, "Lei Gao" <lgao@linkedin.com> wrote:
>>> Hi Mahadev,
>>> Why would the leader be disconnected from ZK? ZK is fine communicating with
>>> the leader in this case. We are talking about asymmetric network failure.
>>> Yes. Leader could consider all the slaves being down if it tracks the status
>>> of all slaves himself. But I guess if ZK is used for for membership
>>> management, neither the leader nor the slaves will be considered
>>> disconnected because they can all connect to ZK.
>>> Thanks,
>>> Lei  
>>> On 4/30/10 3:47 PM, "Mahadev Konar" <mahadev@yahoo-inc.com> wrote:
>>>> Hi Lei,
>>>> In this case, the Leader will be disconnected from ZK cluster and will give
>>>> up its leadership. Since its disconnected, ZK cluster will realize that the
>>>> Leader is dead!....
>>>> When Zk cluster realizes that the Leader is dead (this is because the zk
>>>> cluster hasn't heard from the Leader for a certain time.... Configurable
>>>> via
>>>> session timeout parameter), the slaves will be notified of this via
>>>> watchers
>>>> in zookeeper cluster. The slaves will realize that the Leader is gone and
>>>> will relect a new Leader and will start working with the new Leader.
>>>> Does that answer your question?
>>>> You might want to look though the documentation of ZK to understand its use
>>>> case and how it solves these kind of issues....
>>>> Thanks
>>>> mahadev
>>>> On 4/30/10 2:08 PM, "Lei Gao" <lgao@linkedin.com> wrote:
>>>>> Thank you all for your answers. It clarifies a lot of my confusion about
>>>>> the
>>>>> service guarantees of ZK. I am still struggling with one failure case
>>>>> am
>>>>> not trying to be the pain in the neck. But I need to have a full
>>>>> understanding of what ZK can offer before I make a decision on whether
>>>>> used it in my cluster.)
>>>>> Assume the following topology:
>>>>>          Leader  ==== ZK cluster
>>>>>               \\                    //
>>>>>                \\                  //
>>>>>                  \\               //
>>>>>                       Slave(s)
>>>>> If I am asymmetric network failure such that the connection between Leader
>>>>> and Slave(s) are broken while all other connections are still alive,
>>>>> my system hang after some point? Because no new leader election will
>>>>> initiated by slaves and the leader can't get the work to slave(s).
>>>>> Thanks,
>>>>> Lei
>>>>> On 4/30/10 1:54 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>>>>>> If one of your user clients can no longer reach one member of the
>>>>>> cluster, then it will try to reach another.  If it succeeds, then
it will
>>>>>> continue without any problems as long as the ZK cluster itself is
>>>>>> This applies for all the ZK recipes.  You will have to be a little
>>>>>> careful to handle connection loss, but that should get easier soon
>>>>>> isn't all that difficult anyway).
>>>>>> On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao <lgao@linkedin.com>
>>>>>>> I am not talking about the leader election within zookeeper cluster.
>>>>>>> guess
>>>>>>> I didn't make the discussion context clear. In my case, I run
a cluster
>>>>>>> that
>>>>>>> uses zookeeper for doing the leader election. Yes, nodes in my
>>>>>>> are
>>>>>>> the clients of zookeeper.  Those nodes depend on zookeeper to
elect a
>>>>>>> new
>>>>>>> leader and figure out what the current leader is. So if the zookeeper
>>>>>>> (think
>>>>>>> of it as a stand-alone entity) becomes unavailabe in the way
>>>>>>> described
>>>>>>> earlier, how can I handle such situation so my cluster can still
>>>>>>> function
>>>>>>> while a majority of nodes still connect to each other (but not
to the
>>>>>>> zookeeper)?

View raw message