zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mate Szalay-Beko <msza...@cloudera.com.INVALID>
Subject Re: Participant selection mechanism when implementing Java leader election using zookeeper.
Date Thu, 05 Dec 2019 09:01:19 GMT
Hi Isuru,

>  "what prevents a slightly late node joining the election".

Actually nothing, it is completely normal for nodes to join to the election
process later. E.g. consider the case when you just restarted one of your
nodes. It will then reconnect to ZooKeeper and create a new ephemeral /
sequential znode. (his previous znode is already deleted when he closed his
ZooKeeper connection, as it was set to "ephemeral", which means that the
node will be automatically deleted by ZooKeeper, when the creator of that
ZNode disconnects.)

The leader election algorithm indicated that when the current leader
becomes offline, his ZNode will be deleted. All the nodes will get
notification about the delete event, so they will be able to query all the
children ZNodes of '/election' then see which ZNode has the smallest
sequence number now. This node will be the new leader.

The sequence numbers are automatically monotone increasing with the
creation of each new sequential node. So when someone is 'joining later',
then he will get a relatively large sequence number and it is probable,
that he won't be immediately assigned to be leader, as he will be 'at the
end of the queue'.

Cheers,
Mate

On Wed, Dec 4, 2019 at 3:49 AM Isuru Boyagane <
isuruboyagane.16@cse.mrt.ac.lk> wrote:

> I read about leader election using zookeeper.
>
>  https://zookeeper.apache.org/doc/current/recipes.html#sc_leaderElection
>
> I have a small question ro ask.
> After a node creared "/election" node every node creates child nodes with
> sequential and ephemeral flags.
>
> In what condition nodes decide to select all the child nodes and elect
> smallest one as the leader. In other words "what prevents a slightly late
> node joining the election". Is it a timeout? If so how it is handled in
> Zookeeper internally?
>
> Please advise.
> Thank you.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message