kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mpe...@apache.org
Subject [1/3] kudu git commit: [tools] Fix bug in CheckCompleteMove
Date Tue, 02 Oct 2018 18:52:25 GMT
Repository: kudu
Updated Branches:
  refs/heads/master 2de096ccb -> 6466c0d7d


[tools] Fix bug in CheckCompleteMove

It was possible for the following sequence to happen:

0. We are moving a replica from TS X to TS Y for tablet A. TS X is
   presently the leader.
1. We find the tablet leader (X) and build a proxy to it.
2. To remove X from A, we ask it to step down.
3. Leadership changes quickly and Z != X becomes the leader.
4. Since leadership has changed, we move to remove X from A. To prepare
   we gather consensus state using proxy, thinking we are talking to Z,
   but the proxy is pointed at X, causing a bad status like

   Invalid argument: GetConsensusState: Wrong destination UUID requested. Local UUID: X. Requested
UUID: Z

This bug has always been present but was exposed by the follow-up
graceful leadership transfer patch, since #3 was unlikely with abrupt
stepdown, and if CheckCompleteMove was retried after leadership changed
it would not hit the same problem.

This also reorganizes and re-comments CheckCompleteMove a bit, to try
and make it easier to understand.

Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Reviewed-on: http://gerrit.cloudera.org:8080/11508
Reviewed-by: Alexey Serbin <aserbin@cloudera.com>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/9f9070a4
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/9f9070a4
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/9f9070a4

Branch: refs/heads/master
Commit: 9f9070a4d4c89727e823066a47edff40bbeb3187
Parents: 2de096c
Author: Will Berkeley <wdberkeley@gmail.org>
Authored: Tue Sep 25 11:11:22 2018 -0700
Committer: Will Berkeley <wdberkeley@gmail.com>
Committed: Mon Oct 1 18:52:25 2018 +0000

----------------------------------------------------------------------
 src/kudu/tools/tool_replica_util.cc | 106 ++++++++++++++++++++-----------
 1 file changed, 69 insertions(+), 37 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/9f9070a4/src/kudu/tools/tool_replica_util.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/tool_replica_util.cc b/src/kudu/tools/tool_replica_util.cc
index e06fbcd..f8e7339 100644
--- a/src/kudu/tools/tool_replica_util.cc
+++ b/src/kudu/tools/tool_replica_util.cc
@@ -202,23 +202,22 @@ Status CheckCompleteMove(const vector<string>& master_addresses,
   DCHECK(is_complete);
   DCHECK(completion_status);
   *is_complete = false;
-  // Get the latest leader info.
-  string leader_uuid;
-  HostPort leader_hp;
-  RETURN_NOT_OK(GetTabletLeader(client, tablet_id, &leader_uuid, &leader_hp));
+  // Get the latest leader info. It may change later, due to our actions or
+  // outside factors.
+  string orig_leader_uuid;
+  HostPort orig_leader_hp;
+  RETURN_NOT_OK(GetTabletLeader(client, tablet_id,
+                                &orig_leader_uuid, &orig_leader_hp));
   unique_ptr<ConsensusServiceProxy> proxy;
-  RETURN_NOT_OK(BuildProxy(leader_hp.host(), leader_hp.port(), &proxy));
+  RETURN_NOT_OK(BuildProxy(orig_leader_hp.host(), orig_leader_hp.port(), &proxy));
 
-  // Wait until 'from_ts_uuid' is no longer a member of the config.
+  // Check if the replica with UUID 'to_ts_uuid' is in the config, and if it has
+  // been promoted to voter.
   ConsensusStatePB cstate;
   bool is_343_scheme;
-  RETURN_NOT_OK(GetConsensusState(proxy, tablet_id, leader_uuid,
+  RETURN_NOT_OK(GetConsensusState(proxy, tablet_id, orig_leader_uuid,
                                   client->default_admin_operation_timeout(),
                                   &cstate, &is_343_scheme));
-  // In case if a leader replica is being replaced, there isn't much sense
-  // to make the leader stepping down when the newly added non-voter
-  // replica hasn't caught up yet. The stepped down leader replica will
-  // not be evicted until the newly added replica is promoted to voter.
   bool to_ts_uuid_in_config = false;
   bool to_ts_uuid_is_a_voter = false;
   for (const auto& peer : cstate.committed_config().peers()) {
@@ -231,8 +230,8 @@ Status CheckCompleteMove(const vector<string>& master_addresses,
     }
   }
 
-  // The failure case: the newly added replica is no longer in the config,
-  // it might be something wrong with the replica and the system kicked it out.
+  // The failure case: the newly added replica is no longer in the config.
+  // Maybe something was wrong with the replica and the system kicked it out?
   if (!to_ts_uuid_in_config) {
     *is_complete = true;
     *completion_status = Status::Incomplete(Substitute(
@@ -241,26 +240,42 @@ Status CheckCompleteMove(const vector<string>& master_addresses,
     return Status::OK();
   }
 
-  bool from_ts_uuid_in_config = false; // Is 'from_ts_uuid' still in the config?
+  // Check if the replica slated for removal (the one with UUID 'from_ts_uuid')
+  // is still in the config.
+  bool from_ts_uuid_in_config = false;
   for (const auto& peer : cstate.committed_config().peers()) {
     if (peer.permanent_uuid() == from_ts_uuid) {
       // Sanity check: in case of 3-4-3 replication mode, the source replica
       // must have the REPLACE attribute set. Otherwise, something has changed
       // in the middle and the source replica will never be evicted,
-      // so it does not make much sense awaiting for its removal.
+      // so it does not make sense to await its removal.
       if (is_343_scheme && !peer.attrs().replace()) {
         return Status::IllegalState(Substitute(
             "$0: source replica $1 does not have REPLACE attribute set",
             tablet_id, from_ts_uuid));
       }
-      // The information about the leader replica received by GetTabletLeader()
-      // might be stale and the actual leader might already be different. So,
-      // check that against the consensus information which is more up-to-date.
-      if (leader_uuid == from_ts_uuid && leader_uuid == cstate.leader_uuid() &&
+
+      // Handle the case when the replica-to-be-removed (the one with UUID
+      // 'from_ts_uuid') is the leader.
+      // - It's possible that leadership changed and 'orig_leader_uuid' is not
+      //   the leader's UUID by the time 'cstate' was collected. Let's
+      //   cross-reference the two sources and only act if they agree.
+      // - It doesn't make sense to have the leader step down if the newly-added
+      //   replica hasn't been promoted to a voter yet, since changing
+      //   leadership can only delay that process and the stepped-down leader
+      //   replica will not be evicted until the newly added replica is promoted
+      //   to voter.
+      // - When using the 3-4-3 replica management scheme, the leader master
+      //   will handle removing replicas, but when using the 3-2-3 scheme we
+      //   have to do it, so we only do the stepdown when the tablet is healthy
+      //   and we can proceed to kick out 'from_ts_uuid' once a new leader
+      //   is elected.
+      if (orig_leader_uuid == from_ts_uuid &&
+          orig_leader_uuid == cstate.leader_uuid() &&
           to_ts_uuid_is_a_voter &&
           (is_343_scheme || DoKsckForTablet(master_addresses, tablet_id).ok())) {
         // The leader is the node we intend to remove; make it step down.
-        ignore_result(DoLeaderStepDown(tablet_id, leader_uuid, leader_hp,
+        ignore_result(DoLeaderStepDown(tablet_id, orig_leader_uuid, orig_leader_hp,
                                        client->default_admin_operation_timeout()));
       }
       from_ts_uuid_in_config = true;
@@ -268,29 +283,46 @@ Status CheckCompleteMove(const vector<string>& master_addresses,
     }
   }
 
-  // In case of the 3-4-3 replica management scheme, once the newly added
-  // replica becomes a voter, the rest is taken care by the catalog manager (if
-  // the source replica is leader then it's also necessary to make make it step
-  // down). In contrast the 3-2-3 scheme requires explicitly removing the source
-  // replica since the catalog manager doesn't take care of
-  // over-replicated tablets.
-  if (!is_343_scheme && from_ts_uuid_in_config &&
+  // If we are operating under the 3-4-3 replica management scheme, the
+  // newly-added replica has been promoted to a voter, and (if it was leader)
+  // the replica-to-be-removed has stepped down as leader, then the move is
+  // complete. The leader master will take care of removing the extra replica.
+  if (is_343_scheme) {
+    if (to_ts_uuid_is_a_voter && !from_ts_uuid_in_config) {
+      *is_complete = true;
+      *completion_status = Status::OK();
+    }
+    return Status::OK();
+  }
+
+  // The 3-2-3 scheme requires explicitly removing the source replica since the
+  // leader master doesn't take care of over-replicated tablets. Once the newly
+  // added replica is caught up and ready (ksck returned OK), it's time to
+  // remove the source replica. Once the replica is gone, it will be detected
+  // next on the next retry of this function.
+  if (from_ts_uuid_in_config &&
       DoKsckForTablet(master_addresses, tablet_id).ok()) {
-    // Once the newly added replica is caught up and ready (ksck returned OK),
-    // it's time to remove the source replica. Once the replica is gone,
-    // that will be detected next cycle.
-    string leader_uuid;
-    HostPort leader_hp;
-    RETURN_NOT_OK(GetTabletLeader(client, tablet_id, &leader_uuid, &leader_hp));
-    if (from_ts_uuid != leader_uuid) {
+
+    // Re-find the leader in case it changed (we may have caused it to change).
+    string new_leader_uuid;
+    HostPort new_leader_hp;
+    RETURN_NOT_OK(GetTabletLeader(client, tablet_id,
+                                  &new_leader_uuid, &new_leader_hp));
+    // If leadership changed, we have to rebuild the proxy to the new leader.
+    if (new_leader_uuid != orig_leader_uuid) {
+      RETURN_NOT_OK(BuildProxy(new_leader_hp.host(), new_leader_hp.port(), &proxy));
+    }
+
+    // We can only act if the leader is not the replica being removed.
+    if (from_ts_uuid != new_leader_uuid) {
       // DoChangeConfig() might return InvalidState if the newly elected leader
       // hasn't yet committed a single operation on its term. Let's make sure
       // the current leader has asserted its leadership.
       ConsensusStatePB cstate;
-      RETURN_NOT_OK(GetConsensusState(proxy, tablet_id, leader_uuid,
+      RETURN_NOT_OK(GetConsensusState(proxy, tablet_id, new_leader_uuid,
                                       client->default_admin_operation_timeout(),
                                       &cstate));
-      if (!cstate.has_leader_uuid() || cstate.leader_uuid() != leader_uuid) {
+      if (!cstate.has_leader_uuid() || cstate.leader_uuid() != new_leader_uuid) {
         // Something has changed in the middle, the caller of this method
         // (i.e. the upper level) will need to retry.
         *is_complete = false;
@@ -301,7 +333,7 @@ Status CheckCompleteMove(const vector<string>& master_addresses,
       // Make sure the current leader has asserted its leadership before sending
       // it the ChangeConfig request.
       OpId opid;
-      RETURN_NOT_OK(GetLastCommittedOpId(tablet_id, leader_uuid, leader_hp,
+      RETURN_NOT_OK(GetLastCommittedOpId(tablet_id, new_leader_uuid, new_leader_hp,
                                          client->default_admin_operation_timeout(),
                                          &opid));
       if (opid.term() == cstate.current_term()) {


Mime
View raw message