kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject [1/4] incubator-kudu git commit: itests: address some multi-master flakiness
Date Tue, 01 Mar 2016 07:46:17 GMT
Repository: incubator-kudu
Updated Branches:
  refs/heads/master b17d0d539 -> 1ff209e85


itests: address some multi-master flakiness

As part of the work to ensure multi-master works properly, I did a pass over
the mm integration tests and their flaky failures. Here's what I found.

Many tests are at least a little flaky due to KUDU-1358, as there's really
nothing preventing a master leader election from taking place at any time,
including during a create table setting up a test.

master_failover-itest.cc
- TestCreateTableSync must remain disabled due to KUDU-1358.
- TestPauseAfterCreateTableIssued was disabled from day one (commit
  6be4c23). I suspect this was due to poor deadline handling that has since
  been fixed so I'm reenabling it. It's survived 1000 TSAN runs on ve0518.
- TestDeleteTableSync is flaky due to timeouts in DeleteTable(). Elsewhere
  we use 90s timeouts, so I've made a change to do the same here.
- TestRenameTableSync is flaky due to KUDU-1353.

master_replication-itest.cc
- TestSysTablesReplication had a weird TODO that didn't make sense. It was
  added here: http://gerrit.sjc.cloudera.com:8080/#/c/5483/20..21. The test
  survives 1000 runs on my laptop after that line is removed though.
- TestCycleThroughAllMasters was flaky due to timeouts in Build(). I suspect
  a default RPC timeout that isn't long enough for TSAN/ASAN builds, as it
  is used as the overall deadline for the leader master RPC "fan out".

Change-Id: I1af480f820f7fce922ed9b9712ee4b6376c352a7
Reviewed-on: http://gerrit.cloudera.org:8080/2368
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <todd@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/ffbe5a66
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/ffbe5a66
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/ffbe5a66

Branch: refs/heads/master
Commit: ffbe5a66eb61a8af990ed98a84c01e044533d376
Parents: b17d0d5
Author: Adar Dembo <adar@cloudera.com>
Authored: Wed Feb 24 15:04:27 2016 -0800
Committer: Adar Dembo <adar@cloudera.com>
Committed: Tue Mar 1 07:36:35 2016 +0000

----------------------------------------------------------------------
 src/kudu/client/client.h                         |  2 +-
 .../integration-tests/master_failover-itest.cc   |  9 ++++++---
 .../master_replication-itest.cc                  | 19 ++++++++++---------
 3 files changed, 17 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/ffbe5a66/src/kudu/client/client.h
----------------------------------------------------------------------
diff --git a/src/kudu/client/client.h b/src/kudu/client/client.h
index 05af633..498c20e 100644
--- a/src/kudu/client/client.h
+++ b/src/kudu/client/client.h
@@ -285,7 +285,7 @@ class KUDU_EXPORT KuduClient : public sp::enable_shared_from_this<KuduClient>
{
   FRIEND_TEST(ClientTest, TestScanFaultTolerance);
   FRIEND_TEST(ClientTest, TestScanTimeout);
   FRIEND_TEST(ClientTest, TestWriteWithDeadMaster);
-  FRIEND_TEST(MasterFailoverTest, DISABLED_TestPauseAfterCreateTableIssued);
+  FRIEND_TEST(MasterFailoverTest, TestPauseAfterCreateTableIssued);
 
   KuduClient();
 

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/ffbe5a66/src/kudu/integration-tests/master_failover-itest.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/master_failover-itest.cc b/src/kudu/integration-tests/master_failover-itest.cc
index 02d9655..6ed2f42 100644
--- a/src/kudu/integration-tests/master_failover-itest.cc
+++ b/src/kudu/integration-tests/master_failover-itest.cc
@@ -90,6 +90,11 @@ class MasterFailoverTest : public KuduTest {
     cluster_.reset(new ExternalMiniCluster(opts_));
     ASSERT_OK(cluster_->Start());
     KuduClientBuilder builder;
+
+    // Create and alter table operation timeouts can be extended via their
+    // builders, but there's no such option for DeleteTable, so we extend
+    // the global operation timeout.
+    builder.default_admin_operation_timeout(MonoDelta::FromSeconds(90));
     ASSERT_OK(cluster_->CreateClient(builder, &client_));
   }
 
@@ -103,7 +108,6 @@ class MasterFailoverTest : public KuduTest {
     gscoped_ptr<KuduTableCreator> table_creator(client_->NewTableCreator());
     return table_creator->table_name(table_name)
         .schema(&schema)
-        .timeout(MonoDelta::FromSeconds(90))
         .wait(mode == kWaitForCreate)
         .Create();
   }
@@ -112,7 +116,6 @@ class MasterFailoverTest : public KuduTest {
     gscoped_ptr<KuduTableAlterer> table_alterer(client_->NewTableAlterer(table_name_orig));
     return table_alterer
       ->RenameTo(table_name_new)
-      ->timeout(MonoDelta::FromSeconds(90))
       ->wait(true)
       ->Alter();
   }
@@ -173,7 +176,7 @@ TEST_F(MasterFailoverTest, DISABLED_TestCreateTableSync) {
 //
 // TODO enable this test once flakiness issues are worked out and
 // eliminated on test machines.
-TEST_F(MasterFailoverTest, DISABLED_TestPauseAfterCreateTableIssued) {
+TEST_F(MasterFailoverTest, TestPauseAfterCreateTableIssued) {
   if (!AllowSlowTests()) {
     LOG(INFO) << "This test can only be run in slow mode.";
     return;

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/ffbe5a66/src/kudu/integration-tests/master_replication-itest.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/master_replication-itest.cc b/src/kudu/integration-tests/master_replication-itest.cc
index f3fb1bc..19b4d8c 100644
--- a/src/kudu/integration-tests/master_replication-itest.cc
+++ b/src/kudu/integration-tests/master_replication-itest.cc
@@ -84,9 +84,9 @@ class MasterReplicationTest : public KuduTest {
   }
 
   // This method is meant to be run in a separate thread.
-  void StartClusterDelayed(int64_t micros) {
-    LOG(INFO) << "Sleeping for "  << micros << " micro seconds...";
-    SleepFor(MonoDelta::FromMicroseconds(micros));
+  void StartClusterDelayed(int64_t millis) {
+    LOG(INFO) << "Sleeping for "  << millis << " ms...";
+    SleepFor(MonoDelta::FromMilliseconds(millis));
     LOG(INFO) << "Attempting to start the cluster...";
     CHECK_OK(cluster_->Start());
     CHECK_OK(cluster_->WaitForTabletServerCount(kNumTabletServerReplicas));
@@ -151,9 +151,6 @@ TEST_F(MasterReplicationTest, TestSysTablesReplication) {
   ASSERT_OK(CreateClient(&client));
   ASSERT_OK(CreateTable(client, kTableId1));
 
-  // TODO: once fault tolerant DDL is in, remove the line below.
-  ASSERT_OK(CreateClient(&client));
-
   ASSERT_OK(cluster_->WaitForTabletServerCount(kNumTabletServerReplicas));
 
   // Repeat the same for the second table.
@@ -195,15 +192,19 @@ TEST_F(MasterReplicationTest, TestCycleThroughAllMasters) {
   ASSERT_OK(Thread::Create("TestCycleThroughAllMasters", "start_thread",
                                   &MasterReplicationTest::StartClusterDelayed,
                                   this,
-                                  100 * 1000, // start after 100 millis.
+                                  1000, // start after 1000 millis.
                                   &start_thread));
 
   // Verify that the client doesn't give up even though the entire
-  // cluster is down for 100 milliseconds.
+  // cluster is down for a little while.
+  //
+  // The timeouts for both RPCs and operations are increased to cope with slow
+  // clusters (i.e. TSAN builds).
   shared_ptr<KuduClient> client;
   KuduClientBuilder builder;
   builder.master_server_addrs(master_addrs);
-  builder.default_admin_operation_timeout(MonoDelta::FromSeconds(15));
+  builder.default_admin_operation_timeout(MonoDelta::FromSeconds(90));
+  builder.default_rpc_timeout(MonoDelta::FromSeconds(15));
   EXPECT_OK(builder.Build(&client));
 
   ASSERT_OK(ThreadJoiner(start_thread.get()).Join());


Mime
View raw message