kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject incubator-kudu git commit: catalog_manager: wait for table visitors before shutting down catalog
Date Tue, 08 Mar 2016 02:40:20 GMT
Repository: incubator-kudu
Updated Branches:
  refs/heads/master 1cea8b7e9 -> cef399c10


catalog_manager: wait for table visitors before shutting down catalog

We observed the following crash:

  *** Aborted at 1456910767 (unix time) try "date -d @1456910767" if you are using GNU date
***
  PC: @     0x7f1892a8c3c0 base::subtle::NoBarrier_CompareAndSwap()
  *** SIGSEGV (@0x158) received by PID 6439 (TID 0x7f187b54e700) from PID 344; stack trace:
***
      @       0x3b44e0f710 (unknown) at ??:0
      @     0x7f1892a8c3c0 base::subtle::NoBarrier_CompareAndSwap() at ??:0
      @     0x7f1892a8c440 base::subtle::Acquire_CompareAndSwap() at ??:0
      @     0x7f1892a8c5ca base::SpinLock::Lock() at ??:0
      @     0x7f1892a8ce74 kudu::simple_spinlock::lock() at ??:0
      @     0x7f1892a9a90e boost::lock_guard<>::lock_guard() at ??:0
      @     0x7f1891f2a43a kudu::tablet::MvccManager::TakeSnapshot() at ??:0
      @     0x7f1891f2b00e kudu::tablet::MvccSnapshot::MvccSnapshot() at ??:0
      @     0x7f1891e63755 kudu::tablet::Tablet::NewRowIterator() at ??:0
      @     0x7f1892ade0c1 kudu::master::SysCatalogTable::VisitTablets() at ??:0
      @     0x7f1892a74497 kudu::master::CatalogManager::VisitTablesAndTabletsUnlocked() at
??:0
      @     0x7f1892a73f8e kudu::master::CatalogManager::VisitTablesAndTabletsTask() at ??:0

This can happen if the deferral of VisitTablesAndTabletsTask to the worker
thread pool is slow enough such that the catalog is able to shutdown before
the table visitor starts.

I spent a bunch of time trying to reproduce this crash. The attached test is
simple, but it needs class friendship and I've only seen it trigger the
crash once in thousands of runs. As such, it may hurt more than it helps.

Change-Id: I142b8dbdf4356a324bcde0e63fa44ea63798d509
Reviewed-on: http://gerrit.cloudera.org:8080/2427
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <todd@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/cef399c1
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/cef399c1
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/cef399c1

Branch: refs/heads/master
Commit: cef399c10c2ee0b30a849f26a3a22ee21ef0956f
Parents: 1cea8b7
Author: Adar Dembo <adar@cloudera.com>
Authored: Wed Mar 2 18:07:55 2016 -0800
Committer: Todd Lipcon <todd@apache.org>
Committed: Tue Mar 8 02:40:13 2016 +0000

----------------------------------------------------------------------
 src/kudu/master/catalog_manager.cc | 6 ++++++
 src/kudu/master/catalog_manager.h  | 3 +++
 src/kudu/master/master-test.cc     | 9 +++++++++
 3 files changed, 18 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/cef399c1/src/kudu/master/catalog_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/master/catalog_manager.cc b/src/kudu/master/catalog_manager.cc
index ecf85d3..9e02a8a 100644
--- a/src/kudu/master/catalog_manager.cc
+++ b/src/kudu/master/catalog_manager.cc
@@ -609,6 +609,12 @@ void CatalogManager::Shutdown() {
     e.second->WaitTasksCompletion();
   }
 
+  // Wait for any outstanding table visitors to finish.
+  //
+  // Must be done before shutting down the catalog, otherwise its tablet peer
+  // may be destroyed while still in use by a table visitor.
+  worker_pool_->Shutdown();
+
   // Shut down the underlying storage for tables and tablets.
   if (sys_catalog_) {
     sys_catalog_->Shutdown();

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/cef399c1/src/kudu/master/catalog_manager.h
----------------------------------------------------------------------
diff --git a/src/kudu/master/catalog_manager.h b/src/kudu/master/catalog_manager.h
index 49e39f8..ce56c97 100644
--- a/src/kudu/master/catalog_manager.h
+++ b/src/kudu/master/catalog_manager.h
@@ -416,6 +416,9 @@ class CatalogManager : public tserver::TabletPeerLookupIf {
   consensus::RaftPeerPB::Role Role() const;
 
  private:
+  // So that the test can call ElectedAsLeaderCb() directly.
+  FRIEND_TEST(MasterTest, TestShutdownDuringTableVisit);
+
   friend class TableLoader;
   friend class TabletLoader;
 

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/cef399c1/src/kudu/master/master-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/master/master-test.cc b/src/kudu/master/master-test.cc
index 9fb52db..aeebbf1 100644
--- a/src/kudu/master/master-test.cc
+++ b/src/kudu/master/master-test.cc
@@ -452,5 +452,14 @@ TEST_F(MasterTest, TestInvalidGetTableLocations) {
   }
 }
 
+// Tests that if the master is shutdown while a table visitor is active, the
+// shutdown waits for the visitor to finish, avoiding racing and crashing.
+TEST_F(MasterTest, TestShutdownDuringTableVisit) {
+  ASSERT_OK(master_->catalog_manager()->ElectedAsLeaderCb());
+
+  // Master will now shut down, potentially racing with
+  // CatalogManager::VisitTablesAndTabletsTask.
+}
+
 } // namespace master
 } // namespace kudu


Mime
View raw message