kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mpe...@apache.org
Subject [1/2] incubator-kudu git commit: catalog_manager: prevent spurious dirty callbacks from crashing the process
Date Fri, 24 Jun 2016 00:41:16 GMT
Repository: incubator-kudu
Updated Branches:
  refs/heads/master 630a83bd5 -> 47c023f5e


catalog_manager: prevent spurious dirty callbacks from crashing the process

The recent change to switch from LocalConsensus to RaftConsensus has led to
an increased number of dirty callback calls when using just one master. Some
of these calls occur with no term change; the catalog manager treats them as
any other calls and reloads the on-disk metadata.

In theory that's just unnecessary work, but CheckIsLeaderAndReady() doesn't
provide adequate protection for the rest of the master when in this state, so
nearly every RPC is allowed in during this time. That's an absolute disaster
for correctness: imagine a GetTableLocations() returning only a subset of a
table's tablets because the rest were still being loaded from disk.

Luckily for us it can also manifest as a crash [1] so we noticed it quickly.
I chose to fix it by ignoring these calls within
CatalogManager::VisitTablesAndTabletsTask and not
SysCatalogTable::SysCatalogStateChanged because the synchronization in the
former is more straight forward thanks to the size of worker_pool_. The new
test led to a crash 100% of the time without this fix.

There's an argument to be made for changing TableInfo::TabletMap's raw
pointers to shared pointers thus avoiding this crash altogether, but it's
still an incorrect state to be in, so I don't see the value in doing that.

While I was here, I snuck a few other changes in:
- Remove a lock acquisition from ElectedAsLeaderCb; it did nothing.
- Remove old_role_ from SysCatalogTable; it also did nothing.
- Narrow the lock acquisition in VisitTablesAndTabletsTask; it only needs
  to protect the visiting logic.

1. Sample crash output:

F0618 05:47:41.795367  9330 ref_counted.cc:74] Check failed: !in_dtor_
*** Check failure stack trace: ***
    @     0x7f99fab05f7d  google::LogMessage::Fail() at ??:0
    @     0x7f99fab07e7d  google::LogMessage::SendToLog() at ??:0
    @     0x7f99fab05ab9  google::LogMessage::Flush() at ??:0
    @     0x7f99fab0891f  google::LogMessageFatal::~LogMessageFatal() at ??:0
    @     0x7f99fae8a637  kudu::subtle::RefCountedThreadSafeBase::AddRef() at ??:0
    @     0x7f9a07c486d8  make_scoped_refptr<>() at ??:0
    @     0x7f9a07c2f7f7  kudu::master::TableInfo::GetTabletsInRange() at ??:0
    @     0x7f9a07c2ec35  kudu::master::CatalogManager::GetTableLocations() at ??:0
    @           0x51962e  kudu::master::WaitForRunningTabletCount() at ...
    @           0x51ce59  kudu::CreateTableStressTest_RestartMasterDuringCreation_Test::TestBody()
at ...
    @     0x7f99fc7f5b48  testing::internal::HandleExceptionsInMethodIfSupported<>()
at ??:0
    @     0x7f99fc7ea012  testing::Test::Run() at ??:0
    @     0x7f99fc7ea158  testing::TestInfo::Run() at ??:0
    @     0x7f99fc7ea235  testing::TestCase::Run() at ??:0
    @     0x7f99fc7ea518  testing::internal::UnitTestImpl::RunAllTests() at ??:0
    @     0x7f99fc7f6058  testing::internal::HandleExceptionsInMethodIfSupported<>()
at ??:0
    @     0x7f99fc7ea7fd  testing::UnitTest::Run() at ??:0
    @     0x7f9a08520cf6  main at ??:0
    @     0x7f99f97a9d5d  __libc_start_main at ??:0
    @           0x43e06d  (unknown) at ??:0

Change-Id: I5bc602a51a4914ec13f926bfca555c816a2ee509
Reviewed-on: http://gerrit.cloudera.org:8080/3465
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <todd@apache.org>
Reviewed-by: Mike Percy <mpercy@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/c266702b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/c266702b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/c266702b

Branch: refs/heads/master
Commit: c266702b908bb2a537fd30d8c4bb2e5cf8144cfc
Parents: 630a83b
Author: Adar Dembo <adar@cloudera.com>
Authored: Wed Jun 22 19:29:19 2016 -0700
Committer: Mike Percy <mpercy@apache.org>
Committed: Fri Jun 24 00:40:44 2016 +0000

----------------------------------------------------------------------
 src/kudu/master/catalog_manager.cc | 46 +++++++++++++++++------------
 src/kudu/master/catalog_manager.h  | 25 ++++------------
 src/kudu/master/master-test.cc     | 52 +++++++++++++++++++++++++++++----
 src/kudu/master/sys_catalog.cc     |  6 ++--
 src/kudu/master/sys_catalog.h      |  1 -
 5 files changed, 82 insertions(+), 48 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/c266702b/src/kudu/master/catalog_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/master/catalog_manager.cc b/src/kudu/master/catalog_manager.cc
index 8aeb78e..0a21c5d 100644
--- a/src/kudu/master/catalog_manager.cc
+++ b/src/kudu/master/catalog_manager.cc
@@ -533,6 +533,9 @@ CatalogManager::CatalogManager(Master *master)
     state_(kConstructed),
     leader_ready_term_(-1) {
   CHECK_OK(ThreadPoolBuilder("leader-initialization")
+           // Presently, this thread pool must contain only a single thread
+           // (to correctly serialize invocations of ElectedAsLeaderCb upon
+           // closely timed consecutive elections).
            .set_max_threads(1)
            .Build(&worker_pool_));
 }
@@ -574,7 +577,6 @@ Status CatalogManager::Init(bool is_first_run) {
 }
 
 Status CatalogManager::ElectedAsLeaderCb() {
-  std::lock_guard<simple_spinlock> l(state_lock_);
   return worker_pool_->SubmitClosure(
       Bind(&CatalogManager::VisitTablesAndTabletsTask, Unretained(this)));
 }
@@ -595,9 +597,20 @@ Status CatalogManager::WaitUntilCaughtUpAsLeader(const MonoDelta&
timeout) {
 }
 
 void CatalogManager::VisitTablesAndTabletsTask() {
-
   Consensus* consensus = sys_catalog_->tablet_peer()->consensus();
   int64_t term = consensus->ConsensusState(CONSENSUS_CONFIG_COMMITTED).current_term();
+  {
+    std::lock_guard<simple_spinlock> l(state_lock_);
+    if (leader_ready_term_ == term) {
+      // The term hasn't changed since the last time this master was the
+      // leader. It's not possible for another master to be leader for the same
+      // term, so there hasn't been any actual leadership change and thus
+      // there's no reason to reload the on-disk metadata.
+      VLOG(2) << Substitute("Term $0 hasn't changed, ignoring dirty callback",
+                            term);
+      return;
+    }
+  }
   Status s = WaitUntilCaughtUpAsLeader(
       MonoDelta::FromMilliseconds(FLAGS_master_failover_catchup_timeout_ms));
   if (!s.ok()) {
@@ -610,28 +623,25 @@ void CatalogManager::VisitTablesAndTabletsTask() {
     return;
   }
 
-  {
-    std::lock_guard<LockType> lock(lock_);
-    int64_t term_after_wait = consensus->ConsensusState(CONSENSUS_CONFIG_COMMITTED).current_term();
-    if (term_after_wait != term) {
-      // If we got elected leader again while waiting to catch up then we will
-      // get another callback to visit the tables and tablets, so bail.
-      LOG(INFO) << "Term change from " << term << " to " << term_after_wait
-                << " while waiting for master leader catchup. Not loading sys catalog
metadata";
-      return;
-    }
+  int64_t term_after_wait = consensus->ConsensusState(CONSENSUS_CONFIG_COMMITTED).current_term();
+  if (term_after_wait != term) {
+    // If we got elected leader again while waiting to catch up then we will
+    // get another callback to visit the tables and tablets, so bail.
+    LOG(INFO) << "Term change from " << term << " to " << term_after_wait
+        << " while waiting for master leader catchup. Not loading sys catalog metadata";
+    return;
+  }
 
-    LOG(INFO) << "Loading table and tablet metadata into memory...";
-    LOG_SLOW_EXECUTION(WARNING, 1000, LogPrefix() + "Loading metadata into memory") {
-      CHECK_OK(VisitTablesAndTabletsUnlocked());
-    }
+  LOG(INFO) << "Loading table and tablet metadata into memory...";
+  LOG_SLOW_EXECUTION(WARNING, 1000, LogPrefix() + "Loading metadata into memory") {
+    CHECK_OK(VisitTablesAndTablets());
   }
   std::lock_guard<simple_spinlock> l(state_lock_);
   leader_ready_term_ = term;
 }
 
-Status CatalogManager::VisitTablesAndTabletsUnlocked() {
-  DCHECK(lock_.is_locked());
+Status CatalogManager::VisitTablesAndTablets() {
+  std::lock_guard<LockType> lock(lock_);
 
   // Clear the existing state.
   table_names_map_.clear();

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/c266702b/src/kudu/master/catalog_manager.h
----------------------------------------------------------------------
diff --git a/src/kudu/master/catalog_manager.h b/src/kudu/master/catalog_manager.h
index 1cc52f0..75d6d8e 100644
--- a/src/kudu/master/catalog_manager.h
+++ b/src/kudu/master/catalog_manager.h
@@ -392,15 +392,16 @@ class CatalogManager : public tserver::TabletPeerLookupIf {
   consensus::RaftPeerPB::Role Role() const;
 
  private:
-  // So that the test can call ElectedAsLeaderCb() directly.
+  // These tests call ElectedAsLeaderCb() directly.
   FRIEND_TEST(MasterTest, TestShutdownDuringTableVisit);
+  FRIEND_TEST(MasterTest, TestGetTableLocationsDuringRepeatedTableVisit);
 
   friend class TableLoader;
   friend class TabletLoader;
 
   // Called by SysCatalog::SysCatalogStateChanged when this node
   // becomes the leader of a consensus configuration. Executes VisitTablesAndTabletsTask
-  // below.
+  // via 'worker_pool_'.
   Status ElectedAsLeaderCb();
 
   // Loops and sleeps until one of the following conditions occurs:
@@ -415,24 +416,14 @@ class CatalogManager : public tserver::TabletPeerLookupIf {
   // reading that data, to ensure consistency across failovers.
   Status WaitUntilCaughtUpAsLeader(const MonoDelta& timeout);
 
-  // This method is submitted to 'leader_initialization_pool_' by
-  // ElectedAsLeaderCb above. It:
-  // 1) Acquired 'lock_'
-  // 2) Resets 'tables_tablets_visited_status_'
-  // 3) Runs VisitTablesAndTabletsUnlocked below
-  // 4) Sets 'tables_tablets_visited_status_' to return value of
-  // the call to VisitTablesAndTabletsUnlocked.
-  // 5) Releases 'lock_' and if successful, updates 'leader_ready_term_'
-  // to true (under state_lock_).
+  // Performs several checks before calling VisitTablesAndTablets to actually
+  // reload table/tablet metadata into memory.
   void VisitTablesAndTabletsTask();
 
   // Clears out the existing metadata ('table_names_map_', 'table_ids_map_',
   // and 'tablet_map_'), loads tables metadata into memory and if successful
   // loads the tablets metadata.
-  //
-  // NOTE: Must be called under external synchronization, see
-  // VisitTablesAndTabletsTask() above.
-  Status VisitTablesAndTabletsUnlocked();
+  Status VisitTablesAndTablets();
 
   // Helper for initializing 'sys_catalog_'. After calling this
   // method, the caller should call WaitUntilRunning() on sys_catalog_
@@ -624,10 +615,6 @@ class CatalogManager : public tserver::TabletPeerLookupIf {
 
   // Used to defer work from reactor threads onto a thread where
   // blocking behavior is permissible.
-  //
-  // NOTE: Presently, this thread pool must contain only a single
-  // thread (to correctly serialize invocations of ElectedAsLeaderCb
-  // upon closely timed consecutive elections).
   gscoped_ptr<ThreadPool> worker_pool_;
 
   // This field is updated when a node becomes leader master,

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/c266702b/src/kudu/master/master-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/master/master-test.cc b/src/kudu/master/master-test.cc
index 810507f..f3414d5 100644
--- a/src/kudu/master/master-test.cc
+++ b/src/kudu/master/master-test.cc
@@ -582,11 +582,51 @@ TEST_F(MasterTest, TestShutdownDuringTableVisit) {
   // CatalogManager::VisitTablesAndTabletsTask.
 }
 
-static void GetTableSchema(const char* kTableName,
-                           const Schema* kSchema,
-                           MasterServiceProxy* proxy,
-                           CountDownLatch* started,
-                           AtomicBool* done) {
+// Hammers the master with GetTableLocations() calls.
+static void LoopGetTableLocations(const char* kTableName,
+                                  MasterServiceProxy* proxy,
+                                  AtomicBool* done) {
+  GetTableLocationsRequestPB req;
+  GetTableLocationsResponsePB resp;
+  req.mutable_table()->set_table_name(kTableName);
+
+  while (!done->Load()) {
+    RpcController controller;
+    CHECK_OK(proxy->GetTableLocations(req, &resp, &controller));
+  }
+}
+
+// Tests that the catalog manager handles spurious calls to ElectedAsLeaderCb()
+// (i.e. those without a term change) correctly by ignoring them. If they
+// aren't ignored, a concurrent GetTableLocations() call may trigger a
+// use-after-free.
+TEST_F(MasterTest, TestGetTableLocationsDuringRepeatedTableVisit) {
+  const char* kTableName = "test";
+  Schema schema({ ColumnSchema("key", INT32) }, 1);
+  ASSERT_OK(CreateTable(kTableName, schema));
+
+  AtomicBool done(false);
+  scoped_refptr<Thread> t;
+  ASSERT_OK(Thread::Create("test", "getTableLocationsThread",
+                           &LoopGetTableLocations, kTableName,
+                           proxy_.get(), &done, &t));
+
+  // Call ElectedAsLeaderCb() repeatedly. If these spurious calls aren't
+  // ignored, the concurrent GetTableLocations() calls may crash the master.
+  for (int i = 0; i < 100; i++) {
+    master_->catalog_manager()->ElectedAsLeaderCb();
+  }
+  done.Store(true);
+  t->Join();
+}
+
+// Hammers the master with GetTableSchema() calls, checking that the results
+// make sense.
+static void LoopGetTableSchema(const char* kTableName,
+                               const Schema* kSchema,
+                               MasterServiceProxy* proxy,
+                               CountDownLatch* started,
+                               AtomicBool* done) {
   GetTableSchemaRequestPB req;
   GetTableSchemaResponsePB resp;
   req.mutable_table()->set_table_name(kTableName);
@@ -634,7 +674,7 @@ TEST_F(MasterTest, TestGetTableSchemaIsAtomicWithCreateTable) {
   // Kick off a thread that calls GetTableSchema() in a loop.
   scoped_refptr<Thread> t;
   ASSERT_OK(Thread::Create("test", "test",
-                           &GetTableSchema, kTableName, &kTableSchema,
+                           &LoopGetTableSchema, kTableName, &kTableSchema,
                            proxy_.get(), &started, &done, &t));
 
   // Only create the table after the thread has started.

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/c266702b/src/kudu/master/sys_catalog.cc
----------------------------------------------------------------------
diff --git a/src/kudu/master/sys_catalog.cc b/src/kudu/master/sys_catalog.cc
index 6e7acb1..dba2d77 100644
--- a/src/kudu/master/sys_catalog.cc
+++ b/src/kudu/master/sys_catalog.cc
@@ -81,8 +81,7 @@ SysCatalogTable::SysCatalogTable(Master* master, MetricRegistry* metrics,
                                  ElectedLeaderCallback leader_cb)
     : metric_registry_(metrics),
       master_(master),
-      leader_cb_(std::move(leader_cb)),
-      old_role_(RaftPeerPB::FOLLOWER) {
+      leader_cb_(std::move(leader_cb)) {
   CHECK_OK(ThreadPoolBuilder("apply").Build(&apply_pool_));
 }
 
@@ -237,8 +236,7 @@ void SysCatalogTable::SysCatalogStateChanged(const string& tablet_id,
const stri
                         << "Latest consensus state: " << cstate.ShortDebugString();
   RaftPeerPB::Role new_role = GetConsensusRole(tablet_peer_->permanent_uuid(), cstate);
   LOG_WITH_PREFIX(INFO) << "This master's current role is: "
-                        << RaftPeerPB::Role_Name(new_role)
-                        << ", previous role was: " << RaftPeerPB::Role_Name(old_role_);
+                        << RaftPeerPB::Role_Name(new_role);
   if (new_role == RaftPeerPB::LEADER) {
     Status s = leader_cb_.Run();
 

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/c266702b/src/kudu/master/sys_catalog.h
----------------------------------------------------------------------
diff --git a/src/kudu/master/sys_catalog.h b/src/kudu/master/sys_catalog.h
index 643c06d..efb2125 100644
--- a/src/kudu/master/sys_catalog.h
+++ b/src/kudu/master/sys_catalog.h
@@ -208,7 +208,6 @@ class SysCatalogTable {
   Master* master_;
 
   ElectedLeaderCallback leader_cb_;
-  consensus::RaftPeerPB::Role old_role_;
 
   consensus::RaftPeerPB local_peer_pb_;
 };


Mime
View raw message