hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "ZooKeeper/ClusterMembership" by FlavioJunqueira
Date Tue, 01 Jun 2010 20:52:54 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "ZooKeeper/ClusterMembership" page has been changed by FlavioJunqueira.


New page:
= Cluster Membership: ZOOKEEPER-107 =

== Motivation ==
ZooKeeper clusters are currently static: the set of servers participating in a cluster are
statically defined in a configuration file. In many instances, however, it is desirable to
have the ability of adding and removing servers from an ensemble. The difficulty of implementing
such a feature is making sure that a change in the configuration does not cause inconsistencies
in a ZooKeeper cluster. A related issue is the one of enabling a client to learn the current
ensemble of servers dynamically.

== Requirements ==

 1. The ensemble of servers must agree upon the current configuration, and to reach agreement,
Zab sounds like the obvious choice;
 2. We need new client calls to add and remove servers. It is unclear whether we want one
call for each modification or one call to propose a whole new configuration;
 3. It must work with both majority and flexible quorums;
 4. We need a mechanism, perhaps based on URIs, to enable a client to learn the current configuration.

== Some pre-design random thoughts ==

When moving from one configuration to another, we need to make sure that a quorum of the old
configuration and a quorum of the new configuration commit to the new configuration. A quorum
of the old configuration needs to agree to avoid a split-brain problem, for example, when
adding more servers. A quorum of the new configuration needs to agree for progress. We also
need to make sure that a quorum of the old configuration confirms first, otherwise a partition
could cause a split-brain problem.

If the current leader is part of the old and new configurations, then it can keep being the
leader once the new leader is installed. Otherwise an epoch change becomes necessary. 

It is critical to make sure that every operation committed once a configuration is installed
is acknowledged by a quorum from the new configuration. Otherwise a leader crash can cause
committed operations to be lost. It might be simpler to stall the pipeline of request processors
when a reconfiguration goes through PrepRequestProcessor. By stalling I mean holding operations
until the reconfiguration operation is committed.

View raw message