Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 61972200D5B for ; Wed, 13 Dec 2017 22:00:36 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 60193160C23; Wed, 13 Dec 2017 21:00:36 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A763E160C0F for ; Wed, 13 Dec 2017 22:00:35 +0100 (CET) Received: (qmail 22987 invoked by uid 500); 13 Dec 2017 21:00:34 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 22974 invoked by uid 99); 13 Dec 2017 21:00:34 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Dec 2017 21:00:34 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 25267DFFD9; Wed, 13 Dec 2017 21:00:32 +0000 (UTC) From: afine To: dev@zookeeper.apache.org Reply-To: dev@zookeeper.apache.org References: In-Reply-To: Subject: [GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo... Content-Type: text/plain Message-Id: <20171213210033.25267DFFD9@git1-us-west.apache.org> Date: Wed, 13 Dec 2017 21:00:32 +0000 (UTC) archived-at: Wed, 13 Dec 2017 21:00:36 -0000 Github user afine commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/432#discussion_r156781040 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception { output[0], 2); } + /** + * This test validates that if a quorum member determines that it is leader without the support of the rest of the + * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower. + * + * @throws IOException + * @throws InterruptedException + */ + @Test + public void testElectionFraud() throws IOException, InterruptedException { + // capture QuorumPeer logging + Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout(); + ByteArrayOutputStream os = new ByteArrayOutputStream(); + WriterAppender appender = new WriterAppender(layout, os); + appender.setThreshold(Level.INFO); + Logger qlogger = Logger.getLogger(QuorumPeer.class); + qlogger.addAppender(appender); + + int numServers = 3; + + // used for assertions later + boolean foundLeading = false; + boolean foundLooking = false; + boolean foundFollowing = false; + + try { + // spin up a quorum, we use a small ticktime to make the test run faster + Servers servers = LaunchServers(numServers, 500); + + // find the leader + int trueLeader = -1; + for (int i = 0; i < numServers; i++) { + if (servers.mt[i].main.quorumPeer.leader != null) { + trueLeader = i; + } + } + Assert.assertTrue("There should be a leader", trueLeader >= 0); + + // find a follower + int falseLeader = (trueLeader + 1) % numServers; + Assert.assertTrue(servers.mt[falseLeader].main.quorumPeer.follower != null); + + // to keep the quorum peer running and force it to go into the looking state, we kill leader election + // and close the connection to the leader + servers.mt[falseLeader].main.quorumPeer.electionAlg.shutdown(); + servers.mt[falseLeader].main.quorumPeer.follower.getSocket().close(); + + // wait for the falseLeader to disconnect + waitForOne(servers.zk[falseLeader], States.CONNECTING); + + // convince falseLeader that it is the leader + servers.mt[falseLeader].main.quorumPeer.setPeerState(QuorumPeer.ServerState.LEADING); + + // provide time for the falseleader to realize no followers have connected + // (this is twice the timeout used in Leader#getEpochToPropose) + Thread.sleep(2 * servers.mt[falseLeader].main.quorumPeer.initLimit * servers.mt[falseLeader].main.quorumPeer.tickTime); + + // Restart leader election + servers.mt[falseLeader].main.quorumPeer.startLeaderElection(); --- End diff -- Stopping and starting leader election is necessary here to prevent a race condition. It is possible that after the server is disconnected from the leader it becomes a follower before the test hits `servers.mt[falseLeader].main.quorumPeer.setPeerState(QuorumPeer.ServerState.LEADING);` and falseLeader will never try to `lead`, defeating the purpose of the test. ---