zookeeper-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From h...@apache.org
Subject zookeeper git commit: ZOOKEEPER-2577: Fix flaky testDuringLeaderSync test.
Date Thu, 27 Jul 2017 21:10:59 GMT
Repository: zookeeper
Updated Branches:
  refs/heads/master 41b30a74e -> 32794ca5d


ZOOKEEPER-2577: Fix flaky testDuringLeaderSync test.

testDuringLeaderSync uses the presence of intermediate zoo.cfg.dynamic file to decide if the
reconfig operation was succeeded or not. This is not a problem and is logically correct, however
in tests that provisions QuorumPeer directly through MainThread, the configFile will be null
and the resulted intermediate dynamic reconfig file will has a name of "null.cfg.dynamic".
This causes problem because multiple test cases use MainThread to provision QuorumPeer so
these tests will share a single "null.cfg.dynamic" file, as opposed to the zoo.cfg.dynamic
file in their specific test folder when configFile was not null. Since we support running
concurrent ant unit tests in apache build infrastructure, it is highly likely that multiple
tests that rely on this shared null.cfg.dynamic file will execute simultaneously which create
all sorts of failure cases (this also explains why if one tries to reproduce the test failures
in a standalone fashion the test will always pass, with the absence of t
 he file sharing.).

This is problematic in multiple test cases and in particular cause this testDuringLeaderSync
fail fairly frequently. There are multiple solutions to this problem:

* Implement retry with timeout logic when detecting the presence of intermediate config files
in testDuringLeaderSync. This will fix this specific test but the fix is non-deterministic
and other tests might still fail because of sharing of null.cfg.dynamic file.

* Always make sure to to feed a real config file when provision QuorumPeer. This however is
an overkill as some cases a pure artificial QuorumPeer w/o real config file fit the use case
well.

The approach taking here is to for this specific test, making sure we always have a configured
confFileName, and it is pretty trivial to do so. For other tests, where the intermediate config
file name is not important or irrelevant (e.g. FLE related tests), they will still have null
confFileName because they directly provision QuorumPeer during test.

Testing done: running this patch on apache jenkins for the past week with 500 runs. Not only
this test is fixed but overall stability of entirely unit tests are improved.

Author: Michael Han <hanm@apache.org>

Reviewers: Alexander Shraer <shralex@gmail.com>

Closes #315 from hanm/working


Project: http://git-wip-us.apache.org/repos/asf/zookeeper/repo
Commit: http://git-wip-us.apache.org/repos/asf/zookeeper/commit/32794ca5
Tree: http://git-wip-us.apache.org/repos/asf/zookeeper/tree/32794ca5
Diff: http://git-wip-us.apache.org/repos/asf/zookeeper/diff/32794ca5

Branch: refs/heads/master
Commit: 32794ca5d5a233abc6e019b07aadbfd375060bbd
Parents: 41b30a7
Author: Michael Han <hanm@apache.org>
Authored: Thu Jul 27 14:09:10 2017 -0700
Committer: Michael Han <hanm@apache.org>
Committed: Thu Jul 27 14:10:51 2017 -0700

----------------------------------------------------------------------
 .../org/apache/zookeeper/server/quorum/QuorumPeer.java    | 10 ++++++++--
 .../apache/zookeeper/server/quorum/QuorumPeerConfig.java  |  3 ++-
 .../server/quorum/ReconfigDuringLeaderSyncTest.java       |  1 +
 3 files changed, 11 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/zookeeper/blob/32794ca5/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java
----------------------------------------------------------------------
diff --git a/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java b/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java
index da99fd5..01e5947 100644
--- a/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java
+++ b/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java
@@ -1432,6 +1432,10 @@ public class QuorumPeer extends ZooKeeperThread implements QuorumStats.Provider
     }
 
     public String getNextDynamicConfigFilename() {
+        if (configFilename == null) {
+            LOG.warn("configFilename is null! This should only happen in tests.");
+            return null;
+        }
         return configFilename + QuorumPeerConfig.nextDynamicConfigFileSuffix;
     }
     
@@ -1452,8 +1456,10 @@ public class QuorumPeer extends ZooKeeperThread implements QuorumStats.Provider
             connectNewPeers();
             if (writeToDisk) {
                 try {
-                    QuorumPeerConfig.writeDynamicConfig(
-                            getNextDynamicConfigFilename(), qv, true);
+                    String fileName = getNextDynamicConfigFilename();
+                    if (fileName != null) {
+                        QuorumPeerConfig.writeDynamicConfig(fileName, qv, true);
+                    }
                 } catch (IOException e) {
                     LOG.error("Error writing next dynamic config file to disk: ", e.getMessage());
                 }

http://git-wip-us.apache.org/repos/asf/zookeeper/blob/32794ca5/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java
----------------------------------------------------------------------
diff --git a/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java b/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java
index 96e32d8..02bbe76 100644
--- a/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java
+++ b/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java
@@ -515,7 +515,8 @@ public class QuorumPeerConfig {
     }
 
 
-    public static void deleteFile(String filename){        
+    public static void deleteFile(String filename){
+       if (filename == null) return;
        File f = new File(filename);
        if (f.exists()) {
            try{ 

http://git-wip-us.apache.org/repos/asf/zookeeper/blob/32794ca5/src/java/test/org/apache/zookeeper/server/quorum/ReconfigDuringLeaderSyncTest.java
----------------------------------------------------------------------
diff --git a/src/java/test/org/apache/zookeeper/server/quorum/ReconfigDuringLeaderSyncTest.java
b/src/java/test/org/apache/zookeeper/server/quorum/ReconfigDuringLeaderSyncTest.java
index f7f0c7c..f350abf 100644
--- a/src/java/test/org/apache/zookeeper/server/quorum/ReconfigDuringLeaderSyncTest.java
+++ b/src/java/test/org/apache/zookeeper/server/quorum/ReconfigDuringLeaderSyncTest.java
@@ -257,6 +257,7 @@ public class ReconfigDuringLeaderSyncTest extends QuorumPeerTestBase {
             quorumPeer = new CustomQuorumPeer(config.getQuorumVerifier().getAllMembers(),
config.getDataDir(),
                     config.getDataLogDir(), config.getClientPortAddress().getPort(), config.getElectionAlg(),
                     config.getServerId(), config.getTickTime(), config.getInitLimit(), config.getSyncLimit());
+            quorumPeer.setConfigFileName(config.getConfigFilename());
             quorumPeer.start();
             try {
                 quorumPeer.join();


Mime
View raw message