kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject [2/4] incubator-kudu git commit: catalog_manager: further protection for table visitor racing with shutdown
Date Tue, 03 May 2016 00:17:39 GMT
catalog_manager: further protection for table visitor racing with shutdown

One of my patches ran into the following master-test failure, which I think
is due to a table visitor thread writing to the table map while the
main thread iterates over it in Shutdown(). I can't reproduce the failure,
but copying the table map while holding the catalog manager lock (also held
by table visitors) should do the trick.

master-test: /home/jenkins-slave/workspace/kudu-1/src/kudu/gutil/ref_counted.h:273: T* scoped_refptr<T>::operator->()
const [with T = kudu::master::TableInfo]: Assertion `ptr_ != __null' failed.
*** Aborted at 1461811706 (unix time) try "date -d @1461811706" if you are using GNU date
***
PC: @     0x7fd929da3cc9 gsignal
*** SIGABRT (@0x3e8000000a0) received by PID 160 (TID 0x7fd92d79e840) from PID 160; stack
trace: ***
    @     0x7fd929da3d40 (unknown) at ??:0
    @     0x7fd929da3cc9 gsignal at ??:0
    @     0x7fd929da70d8 abort at ??:0
    @     0x7fd929d9cb86 (unknown) at ??:0
    @     0x7fd929d9cc32 __assert_fail at ??:0
    @     0x7fd92d306fdf scoped_refptr<>::operator->() at ??:0
    @     0x7fd92d2df4fb kudu::master::CatalogManager::Shutdown() at ??:0
    @     0x7fd92d327c6b kudu::master::Master::Shutdown() at ??:0
    @     0x7fd92d33c975 kudu::master::MiniMaster::Shutdown() at ??:0
    @           0x466012 kudu::master::MasterTest::TearDown() at /home/jenkins-slave/workspace/kudu-1/src/kudu/master/master-test.cc:75
    @     0x7fd92ab9a909 testing::internal::HandleExceptionsInMethodIfSupported<>()
at ??:0
    @     0x7fd92ab8c3c0 testing::Test::Run() at ??:0
    @     0x7fd92ab8c4ad testing::TestInfo::Run() at ??:0
    @     0x7fd92ab8c5c5 testing::TestCase::Run() at ??:0
    @     0x7fd92ab8c878 testing::internal::UnitTestImpl::RunAllTests() at ??:0
    @     0x7fd92ab8cb19 testing::UnitTest::Run() at ??:0
    @     0x7fd92cfdf181 RUN_ALL_TESTS() at ??:0
    @     0x7fd92cfdec1e main at ??:0
    @     0x7fd929d8eec5 __libc_start_main at ??:0
    @           0x44e069 (unknown) at ??:?
    @                0x0 (unknown)

Change-Id: Ifbae665944f180c8b1354717c8ab9479c83d8fc8
Reviewed-on: http://gerrit.cloudera.org:8080/2891
Tested-by: Adar Dembo <adar@cloudera.com>
Reviewed-by: Todd Lipcon <todd@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/1681d9a7
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/1681d9a7
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/1681d9a7

Branch: refs/heads/master
Commit: 1681d9a73eedaf5bf73fbec91caa81796123af8d
Parents: 4cb3b3a
Author: Adar Dembo <adar@cloudera.com>
Authored: Wed Apr 27 23:35:30 2016 -0700
Committer: Todd Lipcon <todd@apache.org>
Committed: Tue May 3 00:17:11 2016 +0000

----------------------------------------------------------------------
 src/kudu/master/catalog_manager.cc | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/1681d9a7/src/kudu/master/catalog_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/master/catalog_manager.cc b/src/kudu/master/catalog_manager.cc
index f965d7d..92bd18e 100644
--- a/src/kudu/master/catalog_manager.cc
+++ b/src/kudu/master/catalog_manager.cc
@@ -708,8 +708,18 @@ void CatalogManager::Shutdown() {
     background_tasks_->Shutdown();
   }
 
-  // Abort and Wait tables task completion
-  for (const TableInfoMap::value_type& e : table_ids_map_) {
+  // Mark all outstanding table tasks as aborted and wait for them to fail.
+  //
+  // There may be an outstanding table visitor thread modifying the table map,
+  // so we must make a copy of it before we iterate. It's OK if the visitor
+  // adds more entries to the map even after we finish, but it may not start
+  // any new tasks for those entries.
+  TableInfoMap copy;
+  {
+    boost::lock_guard<simple_spinlock> l(state_lock_);
+    copy = table_ids_map_;
+  }
+  for (const TableInfoMap::value_type &e : copy) {
     e.second->AbortTasks();
     e.second->WaitTasksCompletion();
   }


Mime
View raw message