kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject [2/4] kudu git commit: delete_table-test: fix flakiness with table creation timeout
Date Wed, 05 Oct 2016 21:29:43 GMT
delete_table-test: fix flakiness with table creation timeout

This test was timing out frequently when trying to create a
replication-2 table on a cluster with 3 tservers, one of which was
recently shut down. The master could try to place a replica on the
non-running server, which would then take some time to time out and try
a new placement.

The workaround here is to restart the master so it no longer sees the
crashed server as a valid placement option.

Change-Id: Ic61ad384e1b247f83bfc709528c4c7bda586c9d2
Reviewed-on: http://gerrit.cloudera.org:8080/4632
Reviewed-by: David Ribeiro Alves <dralves@apache.org>
Reviewed-by: Dinesh Bhat <dinesh@cloudera.com>
Tested-by: Kudu Jenkins

Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/98f42cdd
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/98f42cdd
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/98f42cdd

Branch: refs/heads/master
Commit: 98f42cdd878caa429377625a2288d22ed0d114f2
Parents: 0f99d40
Author: Todd Lipcon <todd@apache.org>
Authored: Wed Oct 5 10:52:29 2016 -0700
Committer: David Ribeiro Alves <dralves@apache.org>
Committed: Wed Oct 5 20:26:40 2016 +0000

 src/kudu/integration-tests/delete_table-test.cc | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/kudu/integration-tests/delete_table-test.cc b/src/kudu/integration-tests/delete_table-test.cc
index 6a0de2f..d331d43 100644
--- a/src/kudu/integration-tests/delete_table-test.cc
+++ b/src/kudu/integration-tests/delete_table-test.cc
@@ -432,7 +432,7 @@ TEST_F(DeleteTableTest, TestAutoTombstoneAfterCrashDuringTabletCopy) {
   ASSERT_OK(cluster_->WaitForTabletServerCount(1, MonoDelta::FromSeconds(30)));
-  // Set up a table which has a table only on TS 0. This will be used to test for
+  // Set up a table which has a tablet only on TS 0. This will be used to test for
   // "collateral damage" bugs where incorrect handling of the main test tablet
   // accidentally removes blocks from another tablet.
   // We use a sequential workload so that we just flush and don't compact.
@@ -467,7 +467,15 @@ TEST_F(DeleteTableTest, TestAutoTombstoneAfterCrashDuringTabletCopy)
-  // Create a new tablet which is replicated on the other two servers.
+  // Restart the master to be sure that it only sees the live servers.
+  // Otherwise it may try to create a tablet with a replica on the down server.
+  // The table creation would eventually succeed after picking a different set of
+  // replicas, but not before causing a timeout.
+  cluster_->master()->Shutdown();
+  ASSERT_OK(cluster_->master()->Restart());
+  ASSERT_OK(cluster_->WaitForTabletServerCount(2, MonoDelta::FromSeconds(30)));
+  // Create a new table with a single tablet replicated on the other two servers.
   // We use the same sequential workload. This produces block ID sequences
   // that look like:
   //   TS 0: |---- blocks from 'other-table' ---]

View raw message