kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a...@apache.org
Subject kudu git commit: raft_consensus_nonvoter-itest: deflake a bit
Date Wed, 14 Nov 2018 23:42:45 GMT
Repository: kudu
Updated Branches:
  refs/heads/master eec20423b -> 240695332


raft_consensus_nonvoter-itest: deflake a bit

I saw a failure in ReplicaBehindWalGcThresholdITest.ReplicaReplacement
(GetParam() was (1, false)) just after the master was restarted:

  raft_consensus_nonvoter-itest.cc:2070: Failure
  Failed
  Bad status: Service unavailable: Leader not yet ready to serve requests

This is odd as there's a WaitForCatalogManager() call in there, so why would
a subsequent GetTabletLocations RPC return this ServiceUnavailable? As best
I can tell, the only way for this to happen is if the attempt to grab the
leadership lock from within the ListTables RPC (sent from
WaitForCatalogManager()) returns IllegalState, which it'll do if the
UUID in the master's cstate doesn't match the UUID on disk. Perhaps this can
happen during a leader master election; maybe the cstate's UUID becomes
empty for a little while? If that's true, this should fix the problem by
considering IllegalState to be a non-final state and continuing the loop.

I couldn't repro this failure, but Alexey managed to do so in a dist-test
loop with special latency injection enabled. Without the fix, 93 out of 256
runs failed, and with the fix, none failed.

Change-Id: I8192bd669e7e309943ea82718dd715238d520bbd
Reviewed-on: http://gerrit.cloudera.org:8080/11918
Reviewed-by: Alexey Serbin <aserbin@cloudera.com>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/24069533
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/24069533
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/24069533

Branch: refs/heads/master
Commit: 240695332bef5e3e50d4473ad4c19f7b4786bc1b
Parents: eec2042
Author: Adar Dembo <adar@cloudera.com>
Authored: Fri Nov 9 11:03:26 2018 -0800
Committer: Adar Dembo <adar@cloudera.com>
Committed: Wed Nov 14 23:42:31 2018 +0000

----------------------------------------------------------------------
 .../raft_consensus_nonvoter-itest.cc                  |  2 +-
 src/kudu/mini-cluster/external_mini_cluster.cc        | 13 ++++++++-----
 src/kudu/mini-cluster/external_mini_cluster.h         | 14 ++++++++++++--
 3 files changed, 21 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/24069533/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc b/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
index de29869..2a11642 100644
--- a/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
+++ b/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
@@ -2055,7 +2055,7 @@ TEST_P(ReplicaBehindWalGcThresholdITest, ReplicaReplacement) {
           "--master_add_server_when_underreplicated=true");
       master->Shutdown();
       ASSERT_OK(master->Restart());
-      ASSERT_OK(master->WaitForCatalogManager());
+      ASSERT_OK(master->WaitForCatalogManager(ExternalMaster::WAIT_FOR_LEADERSHIP));
     }
   }
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/24069533/src/kudu/mini-cluster/external_mini_cluster.cc
----------------------------------------------------------------------
diff --git a/src/kudu/mini-cluster/external_mini_cluster.cc b/src/kudu/mini-cluster/external_mini_cluster.cc
index 4fbeb04..a7e3d6b 100644
--- a/src/kudu/mini-cluster/external_mini_cluster.cc
+++ b/src/kudu/mini-cluster/external_mini_cluster.cc
@@ -1189,7 +1189,7 @@ Status ExternalMaster::Restart() {
   return StartProcess(flags);
 }
 
-Status ExternalMaster::WaitForCatalogManager() {
+Status ExternalMaster::WaitForCatalogManager(WaitMode wait_mode) {
   unique_ptr<MasterServiceProxy> proxy(new MasterServiceProxy(
       opts_.messenger, bound_rpc_addr(), bound_rpc_addr().host()));
   Stopwatch sw;
@@ -1206,10 +1206,13 @@ Status ExternalMaster::WaitForCatalogManager() {
       }
       s = StatusFromPB(resp.error().status());
       if (s.IsIllegalState()) {
-        // This master is not the leader but is otherwise up and running.
-        break;
-      }
-      if (!s.IsServiceUnavailable()) {
+        if (wait_mode == DONT_WAIT_FOR_LEADERSHIP) {
+          // This master is not the leader but is otherwise up and running.
+          break;
+        }
+        DCHECK_EQ(wait_mode, WAIT_FOR_LEADERSHIP);
+        // Continue to the sleep below.
+      } else if (!s.IsServiceUnavailable()) {
         // Unexpected error from master.
         return s;
       }

http://git-wip-us.apache.org/repos/asf/kudu/blob/24069533/src/kudu/mini-cluster/external_mini_cluster.h
----------------------------------------------------------------------
diff --git a/src/kudu/mini-cluster/external_mini_cluster.h b/src/kudu/mini-cluster/external_mini_cluster.h
index 9dc2f87..8addfff 100644
--- a/src/kudu/mini-cluster/external_mini_cluster.h
+++ b/src/kudu/mini-cluster/external_mini_cluster.h
@@ -591,8 +591,18 @@ class ExternalMaster : public ExternalDaemon {
   virtual Status Restart() override WARN_UNUSED_RESULT;
 
   // Blocks until the master's catalog manager is initialized and responding to
-  // RPCs.
-  Status WaitForCatalogManager() WARN_UNUSED_RESULT;
+  // RPCs. If 'wait_mode' is WAIT_FOR_LEADERSHIP, will further block until the
+  // master has been elected leader.
+  //
+  // WAIT_FOR_LEADERSHIP should only be used in single-master environments; the
+  // wait may time out in a multi-master environment as there's no guarantee
+  // that this particular master will be elected leader.
+  enum WaitMode {
+    WAIT_FOR_LEADERSHIP,
+    DONT_WAIT_FOR_LEADERSHIP
+  };
+  Status WaitForCatalogManager(
+      WaitMode wait_mode = DONT_WAIT_FOR_LEADERSHIP) WARN_UNUSED_RESULT;
 
  private:
   std::vector<std::string> GetCommonFlags() const;


Mime
View raw message