kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mpe...@apache.org
Subject kudu git commit: consensus_peers-test: fix flake in TSAN builds
Date Tue, 21 Nov 2017 20:51:30 GMT
Repository: kudu
Updated Branches:
  refs/heads/master 034b2af86 -> f0d52ab64


consensus_peers-test: fix flake in TSAN builds

TSAN builds were significantly flaky with the following error:

==8581==ERROR: AddressSanitizer: heap-use-after-free on address 0x615000003fe8 at pc 0x7fde311d3229
bp 0x7fde16b56a10 sp 0x7fde16b56a08
READ of size 8 at 0x615000003fe8 thread T12 (test-raft-pool )
    #0 0x7fde311d3228 in std::_Hashtable<std::string, std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*>,
std::allocator<std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*>
>, std::__detail::_Se
    #1 0x7fde311d4c68 in std::_Hashtable<std::string, std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*>,
std::allocator<std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*>
>, std::__detail::_Se
    #2 0x7fde311c45bc in std::unordered_map<std::string, kudu::consensus::PeerMessageQueue::TrackedPeer*,
std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string
const, kudu::consensus::PeerMessageQueue::TrackedPee
    #3 0x7fde311b55e4 in kudu::consensus::PeerMessageQueue::UntrackPeer(std::string const&)
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_queue.cc:241:23
    #4 0x7fde311a580c in kudu::consensus::Peer::Close() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers.cc:446:11
    #5 0x7fde311a599a in kudu::consensus::Peer::~Peer() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers.cc:450:3
    #6 0x7fde311af50d in std::_Sp_counted_ptr<kudu::consensus::Peer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose()
/opt/rh/devtoolset-3/root/usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../include/c++/4.9.2/bits/shared_ptr_base.h:373:9
    #7 0x560f75 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /opt/rh/devtoolset-3/root/usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../include/c++/4.9.2/bits/shared_ptr_base.h:149:6
    #9 0x569231 in boost::function0<void>::~function0() /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:763:34
    #10 0x568777 in kudu::consensus::TestPeerProxy::Respond(kudu::consensus::TestPeerProxy::Method)
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus-test-util.h:157:3
    #11 0x56f740 in kudu::consensus::DelayablePeerProxy<kudu::consensus::NoOpTestPeerProxy>::RespondUnlessDelayed(kudu::consensus::TestPeerProxy::Method)
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus-test-util.h:1
    #17 0x7fde2689fb3e in kudu::Thread::SuperviseThread(void*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/thread.cc:624:3

0x615000003fe8 is located 232 bytes inside of 488-byte region [0x615000003f00,0x6150000040e8)
freed by thread T0 here:
    #0 0x551210 in operator delete(void*) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:126
    #1 0x55c522 in kudu::consensus::ConsensusPeersTest::~ConsensusPeersTest() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers-test.cc:74:7
    #2 0x55ad89 in kudu::consensus::ConsensusPeersTest_TestRemotePeer_Test::~ConsensusPeersTest_TestRemotePeer_Test()
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers-test.cc:174:1

previously allocated by thread T0 here:
    #0 0x5504d0 in operator new(unsigned long) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:82
    #1 0x55b3e8 in kudu::consensus::ConsensusPeersTest::SetUp() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers-test.cc:100:26

Specifically, the thread destructor was destructing the ConsensusQueue before
letting tasks drain from the "raft_pool_". One of those tasks might have been
associated with one of the peers that is owned by the queue.

This just adds the appropriate Wait() call to let the tasks drain before
destruction.

To test I ran:

./build-support/dist_test.py loop -n 100 build/latest/bin/consensus_peers-test --gtest_filter=\*RemotePeer\*
--stress-cpu-threads=8 --gtest_repeat=100

Prior to this fix, 100/100 failed[1]. After the fix, 100/100 succeeded[2].

[1] http://dist-test.cloudera.org/job?job_id=todd.1511292638.95033
[2] http://dist-test.cloudera.org/job?job_id=todd.1511292986.102150

Change-Id: I33f5e2da3d2d6c275ece20265b6a455e2c4b967d
Reviewed-on: http://gerrit.cloudera.org:8080/8625
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <mpercy@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/f0d52ab6
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/f0d52ab6
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/f0d52ab6

Branch: refs/heads/master
Commit: f0d52ab64404156bab73eca9abcc0e982ee79eee
Parents: 034b2af
Author: Todd Lipcon <todd@apache.org>
Authored: Tue Nov 21 11:42:55 2017 -0800
Committer: Mike Percy <mpercy@apache.org>
Committed: Tue Nov 21 20:51:16 2017 +0000

----------------------------------------------------------------------
 src/kudu/consensus/consensus_peers-test.cc | 5 +++++
 1 file changed, 5 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/f0d52ab6/src/kudu/consensus/consensus_peers-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/consensus_peers-test.cc b/src/kudu/consensus/consensus_peers-test.cc
index e69dc56..7f4eb57 100644
--- a/src/kudu/consensus/consensus_peers-test.cc
+++ b/src/kudu/consensus/consensus_peers-test.cc
@@ -114,6 +114,11 @@ class ConsensusPeersTest : public KuduTest {
   virtual void TearDown() OVERRIDE {
     ASSERT_OK(log_->WaitUntilAllFlushed());
     messenger_->Shutdown();
+    if (raft_pool_) {
+      // Make sure to drain any tasks from the pool we're using for our delayable
+      // proxy before destructing the queue.
+      raft_pool_->Wait();
+    }
   }
 
   DelayablePeerProxy<NoOpTestPeerProxy>* NewRemotePeer(


Mime
View raw message