hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "ZooKeeper/ZooKeeperRecipes" by Flavio Junqueira
Date Wed, 23 Jul 2008 20:49:10 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by Flavio Junqueira:
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperRecipes

The comment on the change is:
. 

------------------------------------------------------------------------------
  
  To solve the first problem, we can have only the coordinator being notified of changes to
the transaction nodes, and then notifying the sites once it reaches a decision. Note that
this approach, although more scalable, is slower as it requires all communication to go through
the coordinator. For the second problem, we can have the coordinator propagating the transaction
to the sites, and having each site creating its own ephemeral node.
  
+ == Leader election ==
+ A simple way of doing leader election with ZooKeeper is to use the SEQUENCE|EPHEMERAL flags
when creating znodes that represent "proposals" of clients. The idea is to have a znode, say
"''/election''", such that each znode creates a child znode "''/election/n_''" with both flags
SEQUENCE|EPHEMERAL. With the sequence flag, ZooKeeper automatically appends a sequence number
that is greater that any one previously appended to a child of "/election". The process that
created the znode with the smallest appended sequence number is the leader. 
+ 
+ That's not all, though. It is important to watch for failures of the leader, so that a new
client arises as the new leader in the case the current leader fails. A trivial solution is
to have all application processes watching upon the current smallest znode, and checking if
they are the new leader when the smallest znode goes away (note that the smallest znode will
go away if the leader fails because the node is ephemeral). This causes what we call "the
herd effect": upon of failure of the current leader, all other processes receive a notification,
and execute ''getChildren'' on "''/election''" to obtain the current list of children of "''/election''".
If the number of application clients is large, then it causes a spike on the number of operations
that ZooKeeper servers have to process. To avoid the herd effect, it is sufficient to watch
for the next znode down on the sequence of znodes. If a client receives a notification that
the znode it is watching upon is gone, then i
 t becomes the new leader in the case that there is no smaller znode. Note that this avoids
the herd effect by not having all clients watching upon the same znode.
+ 
+ Let ''ELECTION'' be a path of choice of the application. To volunteer to be a leader:
+  1. Create znode ''z'' with path "''ELECTION/n_''" with both '''SEQUENCE''' and '''EPHEMERAL'''
flags;
+  1. Let ''C'' be the children of "''ELECTION''", and ''i'' be the sequence number of ''z'';
+  1. Watch for changes on "''ELECTION/n_j''", where ''j'' is the smallest sequence number
such that ''j < i'' and ''n_j'' is a znode in ''C'';
+ 
+ Upon receiving a notification of znode deletion:
+  1. Let ''C'' be the new set of children of ''ELECTION'';
+  1. If ''z'' is the smallest node in ''C'', then execute leader procedure;
+  1. Otherwise, watch for changes on "''ELECTION/n_j''", where ''j'' is the smallest sequence
number such that ''j < i'' and ''n_j'' is a znode in ''C'';       
+ 

Mime
View raw message