kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject [3/3] kudu git commit: test_util: include shard number in test scratch directory name
Date Mon, 16 Apr 2018 17:09:28 GMT
test_util: include shard number in test scratch directory name

Amongst other things, build-and-test.sh checks that every test cleans up its
scratch directory. It only performs this check if all tests pass, because if
a test fails, it'll leave behind its scratch directory for inspection.

This gets a little murky when considering flaky tests, because if one fails,
it is retried a few times. As such, it's important that when run-test.sh
prepares to rerun a flaky test, it cleans up the scratch directory left
behind by the failed run. Unfortunately, run-test.sh isn't really aware of
what it needs to clean up, because each test in the test executable creates
its own test-specific scratch directory within a common root. To account for
this, run-test.sh figures out what to clean using the following algorithm:
1. Before the test, capture the list of all directories within TEST_TMPDIR.
2. After the test, repeat the capture.
3. Diff the two captures to extract a list of directories that exist after
   but not before.
4. Search for the test name as a prefix in the extracted list.

While complex, the algorithm does match and delete all scratch directories
that were left behind by a particular test executable, even if TEST_TMPDIR
is in use by other tests running in parallel. However, sharded tests break
the algorithm, because the test name now includes a shard index, and that
index isn't included in the scratch directory's name.

I experimented with various fixes that included giving run-test.sh more
ownership over TEST_TMPDIR, but eventually settled on a smaller and more
targeted fix: if the test is being sharded, include the shard index in the
scratch directory name.

I tested this patch by running:

  TEST_TMPDIR=/tmp/kudutest-1000 \
  KUDU_FLAKY_TEST_LIST=<(echo raft_consensus-itest) \
    ctest -j8 -R raft_consensus-itest

In the background, I also induced stress on the machine. This caused
RaftConsensusParamReplicationModesITest.Test_KUDU_1735/1 to flake and fail
from time to time.

Without the patch, the flaky test's scratch directory was always left behind
in TEST_TMPDIR. With it, the directory was always gone, unless the test
failed every time it was run, which is expected behavior.

Change-Id: Ie5686eb7c3a011a94b6c8c4eadbedb23acbf6e60
Reviewed-on: http://gerrit.cloudera.org:8080/10069
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <todd@apache.org>

Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/baa1e677
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/baa1e677
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/baa1e677

Branch: refs/heads/master
Commit: baa1e677fd858a2be17a316fd768746a8e076f88
Parents: c260a32
Author: Adar Dembo <adar@cloudera.com>
Authored: Fri Apr 13 16:42:47 2018 -0700
Committer: Todd Lipcon <todd@apache.org>
Committed: Sat Apr 14 01:00:35 2018 +0000

 src/kudu/util/test_util.cc | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/kudu/util/test_util.cc b/src/kudu/util/test_util.cc
index 30726a5..a4ffd01 100644
--- a/src/kudu/util/test_util.cc
+++ b/src/kudu/util/test_util.cc
@@ -214,8 +214,17 @@ string GetTestDataDirectory() {
   // - timestamp and pid: disambiguates with prior runs of the same test
   // e.g. "env-test.TestEnv.TestReadFully.1409169025392361-23600"
-  dir += Substitute("/$0.$1.$2.$3-$4",
+  //
+  // If the test is sharded, the shard index is also included so that the test
+  // invoker can more easily identify all directories belonging to each shard.
+  string shard_index_infix;
+  const char* shard_index = getenv("GTEST_SHARD_INDEX");
+  if (shard_index && shard_index[0] != '\0') {
+    shard_index_infix = Substitute("$0.", shard_index);
+  }
+  dir += Substitute("/$0.$1$2.$3.$4-$5",
     StringReplace(google::ProgramInvocationShortName(), "/", "_", true),
+    shard_index_infix,
     StringReplace(test_info->test_case_name(), "/", "_", true),
     StringReplace(test_info->name(), "/", "_", true),

View raw message