kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jtbirds...@apache.org
Subject [1/2] kudu git commit: KUDU-1735. Fix crash when aborting a skipped config change round
Date Tue, 08 Nov 2016 17:39:24 GMT
Repository: kudu
Updated Branches:
  refs/heads/master 141de3377 -> cf757c12b


KUDU-1735. Fix crash when aborting a skipped config change round

This fixes a crash seen in a test cluster when deleting a tablet
in which there is a pending (uncommitted) REPLICATE message for a
skipped CONFIG_CHANGE operation. The CONFIG_CHANGE was skipped because
the tablet already had a higher indexed configuration written to disk.

This can happen in a couple cases: one, exercised by a test included
here, is if a tablet server crashes just after committing a config
change to disk but before writing the COMMIT message to the WAL, then
upon restart that config change will be a pending operation despite being
committed. If the table is then deleted before the operation is
committed, it would crash on abort.

The approach taken to fix this is as follows:

When aborting a config change operation, we only clear the 'pending'
config if we see that its index matches the index of the operation being
aborted. Otherwise, we ignore the abort (whereas we used to crash).

In order to achieve this, we have to remember the index of the pending
config, which previously wasn't set until after the commit. So, this
assignment is moved out of the commit path and into
AddPendingOperationUnlocked(). Changing the assumption of where the
index was set involved making a few corresponding changes to DCHECKs
elsewhere in the code.

There is a slight wire/upgrade compatibility issue here: previously the
leader would replicate config change messages with no opid set in the
'new_config'. Now, it does. I think this would cause DCHECKs to fire in
a mixed-version cluster, though no CHECKs. Additionally, we aren't
purporting to fully support rolling upgrade yet, so I don't think it's a
big deal. I was able to upgrade the test cluster which experienced this
issue in-place and it came up fine.

This patch removes raft_consensus_state-test, which had some various
assertions against setting pending configuration changes with opids
set. After looking at this test, I realized that it's purporting to test
ReplicaState but in fact all of the methods that it tests are just
pass-throughs to ConsensusMeta and thus are already covered by
consensus_meta-test. So, I figured there's no sense spending time
updating this redundant test code.

I ran raft_consensus-itest 1000 times[1]and the new test alone another
1000 times each in DEBUG[2] and TSAN[3] to test.

[1] http://dist-test.cloudera.org/job?job_id=todd.1478141668.26073
[2] http://dist-test.cloudera.org/job?job_id=todd.1478291990.1887
[3] http://dist-test.cloudera.org/job?job_id=todd.1478292124.2771

Change-Id: I2b4e4eb1d60ef66527edf33357fabc22f4b5b32f
Reviewed-on: http://gerrit.cloudera.org:8080/4916
Reviewed-by: Mike Percy <mpercy@apache.org>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/cf976a40
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/cf976a40
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/cf976a40

Branch: refs/heads/master
Commit: cf976a40e05389dd2f3de098835c09c50a7526b5
Parents: 141de33
Author: Todd Lipcon <todd@apache.org>
Authored: Wed Nov 2 12:01:12 2016 -0700
Committer: Todd Lipcon <todd@apache.org>
Committed: Tue Nov 8 17:36:20 2016 +0000

----------------------------------------------------------------------
 src/kudu/consensus/CMakeLists.txt               |   1 -
 src/kudu/consensus/consensus_meta-test.cc       |   1 +
 src/kudu/consensus/quorum_util.cc               |  19 +---
 src/kudu/consensus/raft_consensus.cc            | 101 +++++++++-------
 src/kudu/consensus/raft_consensus_state-test.cc | 114 -------------------
 src/kudu/consensus/raft_consensus_state.cc      |   5 +-
 .../integration-tests/cluster_itest_util.cc     |   5 +
 .../integration-tests/raft_consensus-itest.cc   |  48 ++++++++
 8 files changed, 119 insertions(+), 175 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/cf976a40/src/kudu/consensus/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/CMakeLists.txt b/src/kudu/consensus/CMakeLists.txt
index da4cc65..a2d08a6 100644
--- a/src/kudu/consensus/CMakeLists.txt
+++ b/src/kudu/consensus/CMakeLists.txt
@@ -135,7 +135,6 @@ ADD_KUDU_TEST(log_index-test)
 ADD_KUDU_TEST(mt-log-test)
 ADD_KUDU_TEST(quorum_util-test)
 ADD_KUDU_TEST(raft_consensus_quorum-test)
-ADD_KUDU_TEST(raft_consensus_state-test)
 
 # Our current version of gmock overrides virtual functions without adding
 # the 'override' keyword which, since our move to c++11, make the compiler

http://git-wip-us.apache.org/repos/asf/kudu/blob/cf976a40/src/kudu/consensus/consensus_meta-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/consensus_meta-test.cc b/src/kudu/consensus/consensus_meta-test.cc
index 7d5f652..a548037 100644
--- a/src/kudu/consensus/consensus_meta-test.cc
+++ b/src/kudu/consensus/consensus_meta-test.cc
@@ -202,6 +202,7 @@ TEST_F(ConsensusMetadataTest, TestToConsensusStatePB) {
 
   uuids.push_back(peer_uuid);
   RaftConfigPB pending_config = BuildConfig(uuids);
+  pending_config.set_opid_index(2);
 
   // Set the pending configuration to be one containing the current leader (who is not
   // in the committed configuration). Ensure that the leader shows up when we ask for

http://git-wip-us.apache.org/repos/asf/kudu/blob/cf976a40/src/kudu/consensus/quorum_util.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/quorum_util.cc b/src/kudu/consensus/quorum_util.cc
index 5db7fff..41572d5 100644
--- a/src/kudu/consensus/quorum_util.cc
+++ b/src/kudu/consensus/quorum_util.cc
@@ -133,20 +133,11 @@ Status VerifyRaftConfig(const RaftConfigPB& config, RaftConfigState
type) {
                    config.ShortDebugString()));
   }
 
-  if (type == COMMITTED_QUORUM) {
-    // Committed configurations must have 'opid_index' populated.
-    if (!config.has_opid_index()) {
-      return Status::IllegalState(
-          Substitute("Committed configs must have opid_index set. RaftConfig: $0",
-                     config.ShortDebugString()));
-    }
-  } else if (type == UNCOMMITTED_QUORUM) {
-    // Uncommitted configurations must *not* have 'opid_index' populated.
-    if (config.has_opid_index()) {
-      return Status::IllegalState(
-          Substitute("Uncommitted configs must not have opid_index set. RaftConfig: $0",
-                     config.ShortDebugString()));
-    }
+  // All configurations must have 'opid_index' populated.
+  if (!config.has_opid_index()) {
+    return Status::IllegalState(
+        Substitute("Configs must have opid_index set. RaftConfig: $0",
+                   config.ShortDebugString()));
   }
 
   for (const RaftPeerPB& peer : config.peers()) {

http://git-wip-us.apache.org/repos/asf/kudu/blob/cf976a40/src/kudu/consensus/raft_consensus.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/raft_consensus.cc b/src/kudu/consensus/raft_consensus.cc
index 956fb6d..385a2c1 100644
--- a/src/kudu/consensus/raft_consensus.cc
+++ b/src/kudu/consensus/raft_consensus.cc
@@ -234,7 +234,7 @@ RaftConsensus::RaftConsensus(
       txn_factory_(txn_factory),
       peer_manager_(std::move(peer_manager)),
       queue_(std::move(queue)),
-      pending_(Substitute("T $0 P $1", options.tablet_id, peer_uuid)),
+      pending_(Substitute("T $0 P $1: ", options.tablet_id, peer_uuid)),
       rng_(GetRandomSeed32()),
       failure_monitor_(GetRandomSeed32(), GetFailureMonitorCheckMeanMs(),
                        GetFailureMonitorCheckStddevMs()),
@@ -604,35 +604,40 @@ Status RaftConsensus::AddPendingOperationUnlocked(const scoped_refptr<ConsensusR
   // If we are adding a pending config change, we need to propagate it to the
   // metadata.
   if (PREDICT_FALSE(round->replicate_msg()->op_type() == CHANGE_CONFIG_OP)) {
-    DCHECK(round->replicate_msg()->change_config_record().has_old_config());
-    DCHECK(round->replicate_msg()->change_config_record().old_config().has_opid_index());
-    DCHECK(round->replicate_msg()->change_config_record().has_new_config());
-    DCHECK(!round->replicate_msg()->change_config_record().new_config().has_opid_index());
-    const RaftConfigPB& old_config = round->replicate_msg()->change_config_record().old_config();
-    const RaftConfigPB& new_config = round->replicate_msg()->change_config_record().new_config();
-    if (state_->GetActiveRoleUnlocked() != RaftPeerPB::LEADER) {
-      // The leader has to mark the configuration as pending before it gets here
-      // because the active configuration affects the replication queue.
-      // Do one last sanity check.
-      Status s = state_->CheckNoConfigChangePendingUnlocked();
-      if (PREDICT_FALSE(!s.ok())) {
-        s = s.CloneAndAppend(Substitute("\n  New config: $0", new_config.ShortDebugString()));
-        LOG_WITH_PREFIX_UNLOCKED(INFO) << s.ToString();
-        return s;
-      }
-      // Check if the pending Raft config has an OpId less than the committed
-      // config. If so, this is a replay at startup in which the COMMIT
-      // messages were delayed.
-      const RaftConfigPB& committed_config = state_->GetCommittedConfigUnlocked();
-      if (round->replicate_msg()->id().index() > committed_config.opid_index())
{
-        CHECK_OK(state_->SetPendingConfigUnlocked(new_config));
-      } else {
-        LOG_WITH_PREFIX_UNLOCKED(INFO) << "Ignoring setting pending config change with
OpId "
-            << round->replicate_msg()->id() << " because the committed
config has OpId index "
-            << committed_config.opid_index() << ". The config change we are ignoring
is: "
-            << "Old config: { " << old_config.ShortDebugString() << " }.
"
-            << "New config: { " << new_config.ShortDebugString() << " }";
+    // Fill in the opid for the proposed new configuration. This has to be done
+    // here rather than when it's first created because we didn't yet have an
+    // OpId assigned at creation time.
+    ChangeConfigRecordPB* change_record = round->replicate_msg()->mutable_change_config_record();
+    change_record->mutable_new_config()->set_opid_index(round->replicate_msg()->id().index());
+
+    DCHECK(change_record->IsInitialized())
+        << "change_config_record missing required fields: "
+        << change_record->InitializationErrorString();
+
+    const RaftConfigPB& new_config = change_record->new_config();
+
+    Status s = state_->CheckNoConfigChangePendingUnlocked();
+    if (PREDICT_FALSE(!s.ok())) {
+      s = s.CloneAndAppend(Substitute("\n  New config: $0", new_config.ShortDebugString()));
+      LOG_WITH_PREFIX_UNLOCKED(INFO) << s.ToString();
+      return s;
+    }
+    // Check if the pending Raft config has an OpId less than the committed
+    // config. If so, this is a replay at startup in which the COMMIT
+    // messages were delayed.
+    const RaftConfigPB& committed_config = state_->GetCommittedConfigUnlocked();
+    if (round->replicate_msg()->id().index() > committed_config.opid_index()) {
+      RETURN_NOT_OK(state_->SetPendingConfigUnlocked(new_config));
+      if (state_->GetActiveRoleUnlocked() == RaftPeerPB::LEADER) {
+        RETURN_NOT_OK(RefreshConsensusQueueAndPeersUnlocked());
       }
+    } else {
+      LOG_WITH_PREFIX_UNLOCKED(INFO)
+          << "Ignoring setting pending config change with OpId "
+          << round->replicate_msg()->id() << " because the committed config
has OpId index "
+          << committed_config.opid_index() << ". The config change we are ignoring
is: "
+          << "Old config: { " << change_record->old_config().ShortDebugString()
<< " }. "
+          << "New config: { " << new_config.ShortDebugString() << " }";
     }
   }
 
@@ -1561,12 +1566,12 @@ void RaftConsensus::Shutdown() {
   // We must close the queue after we close the peers.
   queue_->Close();
 
-  CHECK_OK(pending_.CancelPendingTransactions());
 
   {
     ReplicaState::UniqueLock lock;
     CHECK_OK(state_->LockForShutdown(&lock));
     CHECK_EQ(ReplicaState::kShuttingDown, state_->state());
+    CHECK_OK(pending_.CancelPendingTransactions());
     CHECK_OK(state_->ShutdownUnlocked());
     LOG_WITH_PREFIX_UNLOCKED(INFO) << "Raft consensus is shut down!";
   }
@@ -1781,9 +1786,6 @@ Status RaftConsensus::ReplicateConfigChangeUnlocked(const RaftConfigPB&
old_conf
                                              Unretained(round.get()),
                                              client_cb));
 
-  // Set as pending.
-  RETURN_NOT_OK(state_->SetPendingConfigUnlocked(new_config));
-  RETURN_NOT_OK(RefreshConsensusQueueAndPeersUnlocked());
   CHECK_OK(AppendNewRoundToQueueUnlocked(round));
   return Status::OK();
 }
@@ -2050,26 +2052,41 @@ void RaftConsensus::NonTxRoundReplicationFinished(ConsensusRound*
round,
 }
 
 void RaftConsensus::CompleteConfigChangeRoundUnlocked(ConsensusRound* round, const Status&
status) {
+  const OpId& op_id = round->replicate_msg()->id();
+
   if (!status.ok()) {
-    // Abort a failed config change.
-    CHECK(state_->IsConfigChangePendingUnlocked())
-        << LogPrefixUnlocked() << "Aborting CHANGE_CONFIG_OP but "
-        << "there was no pending config set. Op: "
-        << round->replicate_msg()->ShortDebugString();
-    state_->ClearPendingConfigUnlocked();
+    // If the config change being aborted is the current pending one, abort it.
+    if (state_->IsConfigChangePendingUnlocked() &&
+        state_->GetPendingConfigUnlocked().opid_index() == op_id.index()) {
+      LOG_WITH_PREFIX_UNLOCKED(INFO) << "Aborting config change with OpId "
+                                     << op_id << ": " << status.ToString();
+      state_->ClearPendingConfigUnlocked();
+    } else {
+      LOG_WITH_PREFIX_UNLOCKED(INFO)
+          << "Skipping abort of non-pending config change with OpId "
+          << op_id << ": " << status.ToString();
+    }
+
+    // It's possible to abort a config change which isn't the pending one in the following
+    // sequence:
+    // - replicate a config change
+    // - it gets committed, so we write the new config to disk as the Committed configuration
+    // - we crash before the COMMIT message hits the WAL
+    // - we restart the server, and the config change is added as a pending round again,
+    //   but isn't set as Pending because it's already committed.
+    // - we delete the tablet before committing it
+    // See KUDU-1735.
     return;
   }
 
   // Commit the successful config change.
-  const OpId& op_id = round->replicate_msg()->id();
 
   DCHECK(round->replicate_msg()->change_config_record().has_old_config());
   DCHECK(round->replicate_msg()->change_config_record().has_new_config());
   RaftConfigPB old_config = round->replicate_msg()->change_config_record().old_config();
   RaftConfigPB new_config = round->replicate_msg()->change_config_record().new_config();
   DCHECK(old_config.has_opid_index());
-  DCHECK(!new_config.has_opid_index());
-  new_config.set_opid_index(op_id.index());
+  DCHECK(new_config.has_opid_index());
   // Check if the pending Raft config has an OpId less than the committed
   // config. If so, this is a replay at startup in which the COMMIT
   // messages were delayed.

http://git-wip-us.apache.org/repos/asf/kudu/blob/cf976a40/src/kudu/consensus/raft_consensus_state-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/raft_consensus_state-test.cc b/src/kudu/consensus/raft_consensus_state-test.cc
deleted file mode 100644
index 0665e03..0000000
--- a/src/kudu/consensus/raft_consensus_state-test.cc
+++ /dev/null
@@ -1,114 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-#include "kudu/consensus/raft_consensus_state.h"
-
-#include <gtest/gtest.h>
-#include <memory>
-#include <vector>
-
-#include "kudu/consensus/consensus-test-util.h"
-#include "kudu/consensus/consensus.pb.h"
-#include "kudu/consensus/consensus_meta.h"
-#include "kudu/fs/fs_manager.h"
-#include "kudu/util/test_macros.h"
-#include "kudu/util/test_util.h"
-
-namespace kudu {
-namespace consensus {
-
-using std::unique_ptr;
-
-// TODO(mpercy): Share a test harness with ConsensusMetadataTest?
-const char* kTabletId = "TestTablet";
-
-class RaftConsensusStateTest : public KuduTest {
- public:
-  RaftConsensusStateTest()
-      : fs_manager_(env_.get(), GetTestPath("fs_root")) {
-  }
-
-  virtual void SetUp() OVERRIDE {
-    KuduTest::SetUp();
-    ASSERT_OK(fs_manager_.CreateInitialFileSystemLayout());
-    ASSERT_OK(fs_manager_.Open());
-
-    // Initialize test configuration.
-    config_.set_opid_index(kInvalidOpIdIndex);
-    RaftPeerPB* peer = config_.add_peers();
-    peer->set_permanent_uuid(fs_manager_.uuid());
-    peer->set_member_type(RaftPeerPB::VOTER);
-
-    unique_ptr<ConsensusMetadata> cmeta;
-    ASSERT_OK(ConsensusMetadata::Create(&fs_manager_, kTabletId, fs_manager_.uuid(),
-                                        config_, kMinimumTerm, &cmeta));
-    state_.reset(new ReplicaState(ConsensusOptions(), fs_manager_.uuid(), std::move(cmeta)));
-
-    // Start up the ReplicaState.
-    ReplicaState::UniqueLock lock;
-    ASSERT_OK(state_->LockForStart(&lock));
-    ASSERT_OK(state_->StartUnlocked(MinimumOpId()));
-  }
-
- protected:
-  FsManager fs_manager_;
-  RaftConfigPB config_;
-  gscoped_ptr<ReplicaState> state_;
-};
-
-// Test that we can transition a new configuration from a pending state into a
-// persistent state.
-TEST_F(RaftConsensusStateTest, TestPendingPersistent) {
-  ReplicaState::UniqueLock lock;
-  ASSERT_OK(state_->LockForConfigChange(&lock));
-
-  config_.clear_opid_index();
-  ASSERT_OK(state_->SetPendingConfigUnlocked(config_));
-  ASSERT_TRUE(state_->IsConfigChangePendingUnlocked());
-  ASSERT_FALSE(state_->GetPendingConfigUnlocked().has_opid_index());
-  ASSERT_TRUE(state_->GetCommittedConfigUnlocked().has_opid_index());
-
-  ASSERT_FALSE(state_->SetCommittedConfigUnlocked(config_).ok());
-  config_.set_opid_index(1);
-  ASSERT_TRUE(state_->SetCommittedConfigUnlocked(config_).ok());
-
-  ASSERT_FALSE(state_->IsConfigChangePendingUnlocked());
-  ASSERT_EQ(1, state_->GetCommittedConfigUnlocked().opid_index());
-}
-
-// Ensure that we can set persistent configurations directly.
-TEST_F(RaftConsensusStateTest, TestPersistentWrites) {
-  ReplicaState::UniqueLock lock;
-  ASSERT_OK(state_->LockForConfigChange(&lock));
-
-  ASSERT_FALSE(state_->IsConfigChangePendingUnlocked());
-  ASSERT_EQ(kInvalidOpIdIndex, state_->GetCommittedConfigUnlocked().opid_index());
-
-  config_.clear_opid_index();
-  ASSERT_OK(state_->SetPendingConfigUnlocked(config_));
-  config_.set_opid_index(1);
-  ASSERT_OK(state_->SetCommittedConfigUnlocked(config_));
-  ASSERT_EQ(1, state_->GetCommittedConfigUnlocked().opid_index());
-
-  config_.clear_opid_index();
-  ASSERT_OK(state_->SetPendingConfigUnlocked(config_));
-  config_.set_opid_index(2);
-  ASSERT_OK(state_->SetCommittedConfigUnlocked(config_));
-  ASSERT_EQ(2, state_->GetCommittedConfigUnlocked().opid_index());
-}
-
-}  // namespace consensus
-}  // namespace kudu

http://git-wip-us.apache.org/repos/asf/kudu/blob/cf976a40/src/kudu/consensus/raft_consensus_state.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/raft_consensus_state.cc b/src/kudu/consensus/raft_consensus_state.cc
index 7b9454a..a0b38d4 100644
--- a/src/kudu/consensus/raft_consensus_state.cc
+++ b/src/kudu/consensus/raft_consensus_state.cc
@@ -222,13 +222,10 @@ Status ReplicaState::SetCommittedConfigUnlocked(const RaftConfigPB&
committed_co
                         "Invalid config to set as committed");
 
   // Compare committed with pending configuration, ensure they are the same.
-  // Pending will not have an opid_index, so ignore that field.
   DCHECK(cmeta_->has_pending_config());
-  RaftConfigPB config_no_opid = committed_config;
-  config_no_opid.clear_opid_index();
   const RaftConfigPB& pending_config = GetPendingConfigUnlocked();
   // Quorums must be exactly equal, even w.r.t. peer ordering.
-  CHECK_EQ(GetPendingConfigUnlocked().SerializeAsString(), config_no_opid.SerializeAsString())
+  CHECK_EQ(GetPendingConfigUnlocked().SerializeAsString(), committed_config.SerializeAsString())
       << Substitute("New committed config must equal pending config, but does not.
"
                     "Pending config: $0, committed config: $1",
                     pending_config.ShortDebugString(), committed_config.ShortDebugString());

http://git-wip-us.apache.org/repos/asf/kudu/blob/cf976a40/src/kudu/integration-tests/cluster_itest_util.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/cluster_itest_util.cc b/src/kudu/integration-tests/cluster_itest_util.cc
index 5af760e..898117a 100644
--- a/src/kudu/integration-tests/cluster_itest_util.cc
+++ b/src/kudu/integration-tests/cluster_itest_util.cc
@@ -671,6 +671,11 @@ Status WaitForNumTabletsOnTS(TServerDetails* ts,
                              int count,
                              const MonoDelta& timeout,
                              vector<ListTabletsResponsePB::StatusAndSchemaPB>* tablets)
{
+  // If the user doesn't care about collecting the resulting tablets, collect into a local
+  // vector.
+  vector<ListTabletsResponsePB::StatusAndSchemaPB> tablets_local;
+  if (tablets == nullptr) tablets = &tablets_local;
+
   Status s;
   MonoTime deadline = MonoTime::Now() + timeout;
   while (true) {

http://git-wip-us.apache.org/repos/asf/kudu/blob/cf976a40/src/kudu/integration-tests/raft_consensus-itest.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/raft_consensus-itest.cc b/src/kudu/integration-tests/raft_consensus-itest.cc
index 1acb63b..41dedb0 100644
--- a/src/kudu/integration-tests/raft_consensus-itest.cc
+++ b/src/kudu/integration-tests/raft_consensus-itest.cc
@@ -2565,6 +2565,54 @@ TEST_F(RaftConsensusITest, TestChangeConfigRejectedUnlessNoopReplicated)
{
   ASSERT_STR_CONTAINS(s.ToString(), "Leader has not yet committed an operation in its own
term");
 }
 
+// Regression test for KUDU-1735, a crash in the case where a pending
+// config change operation is aborted during tablet deletion when that config change
+// was in fact already persisted to disk.
+TEST_F(RaftConsensusITest, Test_KUDU_1735) {
+  MonoDelta kTimeout = MonoDelta::FromSeconds(10);
+  vector<string> ts_flags = { "--enable_leader_failure_detection=false" };
+  vector<string> master_flags = { "--catalog_manager_wait_for_new_tablets_to_elect_leader=false"
};
+  NO_FATALS(BuildAndStart(ts_flags, master_flags));
+
+  vector<TServerDetails*> tservers;
+  vector<ExternalTabletServer*> external_tservers;
+  AppendValuesFromMap(tablet_servers_, &tservers);
+  for (TServerDetails* ts : tservers) {
+    external_tservers.push_back(cluster_->tablet_server_by_uuid(ts->uuid()));
+  }
+
+  // Elect server 0 as leader and wait for log index 1 to propagate to all servers.
+  TServerDetails* leader_tserver = tservers[0];
+  ASSERT_OK(StartElection(leader_tserver, tablet_id_, kTimeout));
+  ASSERT_OK(WaitUntilLeader(leader_tserver, tablet_id_, kTimeout));
+  ASSERT_OK(WaitForServersToAgree(MonoDelta::FromSeconds(10), tablet_servers_,
+                                  tablet_id_, 1));
+
+  // Make follower tablet servers crash before writing a commit message.
+  for (int i = 1; i < cluster_->num_tablet_servers(); i++) {
+    ASSERT_OK(cluster_->SetFlag(external_tservers[i], "fault_crash_before_append_commit",
"1.0"));
+  }
+
+  // Run a config change. This will cause the other servers to crash with pending config
+  // change operations due to the above fault injection.
+  ASSERT_OK(RemoveServer(leader_tserver, tablet_id_, tservers[1], boost::none, kTimeout));
+  for (int i = 1; i < cluster_->num_tablet_servers(); i++) {
+    ASSERT_OK(external_tservers[i]->WaitForCrash(kTimeout));
+  }
+
+  // Delete the table, so that when we restart the crashed servers, they'll get RPCs to
+  // delete tablets while config changes are pending.
+  ASSERT_OK(client_->DeleteTable(kTableId));
+
+  // Restart the crashed tservers and wait for them to delete their replicas.
+  for (int i = 1; i < cluster_->num_tablet_servers(); i++) {
+    auto* ts = external_tservers[i];
+    ts->Shutdown();
+    ASSERT_OK(ts->Restart());
+    ASSERT_OK(WaitForNumTabletsOnTS(tservers[i], 0, kTimeout, nullptr));
+  }
+}
+
 // Test that if for some reason none of the transactions can be prepared, that it will come
 // back as an error in UpdateConsensus().
 TEST_F(RaftConsensusITest, TestUpdateConsensusErrorNonePrepared) {


Mime
View raw message