HI Lei,
ZooKeeper provides a set of primitives which allows you to do all kinds of
things! You might want to take a look at the api and some examples of
zookeeper recipes to see how it works and probably that will clear things
out for you.
Here are the links:
http://hadoop.apache.org/zookeeper/docs/r3.3.0/recipes.html
Thanks
mahadev
On 4/30/10 4:46 PM, "Lei Gao" <lgao@linkedin.com> wrote:
> Hi Mahadev,
>
> First of all, I like to thank you for being patient with me - my questions
> seem unclear to many of you who try to help me.
>
> I guess clients have to be smart enough to trigger a new leader election by
> trying to delete the znode. But in this case, ZK should not allow any single
> or multiple (as long as they are less than a quorum) client(s) to delete the
> znode responding to the master, right? A new consensus among clients (NOT
> among the nodes in zk cluster) has to be there for the znode to be deleted,
> right? Does zk have this capability or the clients have to come to this
> consensus outside of zk before trying to delete the znode in zk?
>
> Thanks,
>
> Lei
>
>> Hi Lei,
>> Sorry I minsinterpreted your question! The scenario you describe could be
>> handled in such a way -
>>
>> You could have a status node in ZooKeeper which every slave will subscribe
>> to and update! If one of the slave nodes sees that there have been too many
>> connection refused to the Leader by the slaves, the slave could go ahead and
>> delete the Leader znode, and force the Leader to give up its leadership. I
>> am not describing a deatiled way to do it, but its not very hard to come up
>> with a design for this.
>>
>>
>>
>> Do you intend to have the Leader and Slaves in different Network (different
>> ACLs I mean) protected zones? In that case, it is a legitimate concern else
>> I do think assymetric network partition would be very unlikely to happen.
>>
>> Do you usually see network partitions in such scenarios?
>>
>> Thanks
>> mahadev
>>
>>
>> On 4/30/10 4:05 PM, "Lei Gao" <lgao@linkedin.com> wrote:
>>
>>> Hi Mahadev,
>>>
>>> Why would the leader be disconnected from ZK? ZK is fine communicating with
>>> the leader in this case. We are talking about asymmetric network failure.
>>> Yes. Leader could consider all the slaves being down if it tracks the status
>>> of all slaves himself. But I guess if ZK is used for for membership
>>> management, neither the leader nor the slaves will be considered
>>> disconnected because they can all connect to ZK.
>>>
>>> Thanks,
>>>
>>> Lei
>>>
>>>
>>> On 4/30/10 3:47 PM, "Mahadev Konar" <mahadev@yahoo-inc.com> wrote:
>>>
>>>> Hi Lei,
>>>>
>>>> In this case, the Leader will be disconnected from ZK cluster and will give
>>>> up its leadership. Since its disconnected, ZK cluster will realize that the
>>>> Leader is dead!....
>>>>
>>>> When Zk cluster realizes that the Leader is dead (this is because the zk
>>>> cluster hasn't heard from the Leader for a certain time.... Configurable
>>>> via
>>>> session timeout parameter), the slaves will be notified of this via
>>>> watchers
>>>> in zookeeper cluster. The slaves will realize that the Leader is gone and
>>>> will relect a new Leader and will start working with the new Leader.
>>>>
>>>> Does that answer your question?
>>>>
>>>> You might want to look though the documentation of ZK to understand its use
>>>> case and how it solves these kind of issues....
>>>>
>>>> Thanks
>>>> mahadev
>>>>
>>>>
>>>> On 4/30/10 2:08 PM, "Lei Gao" <lgao@linkedin.com> wrote:
>>>>
>>>>> Thank you all for your answers. It clarifies a lot of my confusion about
>>>>> the
>>>>> service guarantees of ZK. I am still struggling with one failure case
(I
>>>>> am
>>>>> not trying to be the pain in the neck. But I need to have a full
>>>>> understanding of what ZK can offer before I make a decision on whether
to
>>>>> used it in my cluster.)
>>>>>
>>>>> Assume the following topology:
>>>>>
>>>>> Leader ==== ZK cluster
>>>>> \\ //
>>>>> \\ //
>>>>> \\ //
>>>>> Slave(s)
>>>>>
>>>>> If I am asymmetric network failure such that the connection between Leader
>>>>> and Slave(s) are broken while all other connections are still alive,
would
>>>>> my system hang after some point? Because no new leader election will
be
>>>>> initiated by slaves and the leader can't get the work to slave(s).
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Lei
>>>>>
>>>>> On 4/30/10 1:54 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>>>>>
>>>>>> If one of your user clients can no longer reach one member of the
ZK
>>>>>> cluster, then it will try to reach another. If it succeeds, then
it will
>>>>>> continue without any problems as long as the ZK cluster itself is
OK.
>>>>>>
>>>>>> This applies for all the ZK recipes. You will have to be a little
bit
>>>>>> careful to handle connection loss, but that should get easier soon
(and
>>>>>> isn't all that difficult anyway).
>>>>>>
>>>>>> On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao <lgao@linkedin.com>
wrote:
>>>>>>
>>>>>>> I am not talking about the leader election within zookeeper cluster.
I
>>>>>>> guess
>>>>>>> I didn't make the discussion context clear. In my case, I run
a cluster
>>>>>>> that
>>>>>>> uses zookeeper for doing the leader election. Yes, nodes in my
cluster
>>>>>>> are
>>>>>>> the clients of zookeeper. Those nodes depend on zookeeper to
elect a
>>>>>>> new
>>>>>>> leader and figure out what the current leader is. So if the zookeeper
>>>>>>> (think
>>>>>>> of it as a stand-alone entity) becomes unavailabe in the way
I've
>>>>>>> described
>>>>>>> earlier, how can I handle such situation so my cluster can still
>>>>>>> function
>>>>>>> while a majority of nodes still connect to each other (but not
to the
>>>>>>> zookeeper)?
>>>>>>>
>>>>>
>>>>
>>>
>>
>
|