kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject kudu git commit: Fix SIGSEGV in ksck
Date Fri, 28 Jul 2017 17:15:59 GMT
Repository: kudu
Updated Branches:
  refs/heads/branch-1.4.x 5b0b786d2 -> 7722dc8e0


Fix SIGSEGV in ksck

ksck will segfault when some tablet servers that host tablet
replicas are missing. This happens, for example, if the master
is still restarting and has not yet fully populated its list of
live tablet servers.

The root cause is that a vector of replicas is being sorted by
tserver uuid obtained from the master even if the master is not
aware of the tablet server was not found, causing a segmentation
fault when trying to access the uuid. The fix just checks for a
missing tserver reference and sorts such replicas first.

The included test segfaults without the fix.

Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
Reviewed-on: http://gerrit.cloudera.org:8080/7261
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <davidralves@gmail.com>
(cherry picked from commit def06d8b8155d2aa5838d19f00cce940d2233ad1)
Reviewed-on: http://gerrit.cloudera.org:8080/7536
Reviewed-by: Jean-Daniel Cryans <jdcryans@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/7722dc8e
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/7722dc8e
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/7722dc8e

Branch: refs/heads/branch-1.4.x
Commit: 7722dc8e0aac8c3d8e438d104ff8ffb6d3739b23
Parents: 5b0b786
Author: David Alves <dralves@apache.org>
Authored: Thu Jun 22 17:40:04 2017 +0100
Committer: Todd Lipcon <todd@apache.org>
Committed: Fri Jul 28 17:13:59 2017 +0000

----------------------------------------------------------------------
 src/kudu/tools/ksck-test.cc | 13 +++++++++++++
 src/kudu/tools/ksck.cc      |  6 ++++--
 2 files changed, 17 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/7722dc8e/src/kudu/tools/ksck-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/ksck-test.cc b/src/kudu/tools/ksck-test.cc
index 90a8f1d..1e0ea24 100644
--- a/src/kudu/tools/ksck-test.cc
+++ b/src/kudu/tools/ksck-test.cc
@@ -503,5 +503,18 @@ TEST_F(KsckTest, TestTabletNotRunning) {
       "    Last status: \n");
 }
 
+// Test for a bug where we weren't properly handling a tserver not reported by the master.
+TEST_F(KsckTest, TestMissingTserver) {
+  CreateOneSmallReplicatedTable();
+
+  // Delete a tablet server from the master's list. This simulates a situation
+  // where the master is starting and hasn't listed all tablet servers yet, but
+  // tablets from other tablet servers are listing the missing tablet server as a peer.
+  EraseKeyReturnValuePtr(&master_->tablet_servers_, "ts-id-0");
+  Status s = RunKsck();
+  ASSERT_EQ("Corruption: 1 table(s) are bad", s.ToString());
+  ASSERT_STR_CONTAINS(err_stream_.str(), "Table test has 3 under-replicated tablet(s)");
+}
+
 } // namespace tools
 } // namespace kudu

http://git-wip-us.apache.org/repos/asf/kudu/blob/7722dc8e/src/kudu/tools/ksck.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/ksck.cc b/src/kudu/tools/ksck.cc
index a99d4c1..35dec83 100644
--- a/src/kudu/tools/ksck.cc
+++ b/src/kudu/tools/ksck.cc
@@ -661,8 +661,8 @@ Ksck::CheckResult Ksck::VerifyTablet(const shared_ptr<KsckTablet>&
tablet, int t
 
     // Check for agreement on tablet assignment and state between the master
     // and the tablet server.
-    auto ts = FindPtrOrNull(cluster_->tablet_servers(), replica->ts_uuid());
-    repl_info->ts = ts.get();
+    auto ts = FindPointeeOrNull(cluster_->tablet_servers(), replica->ts_uuid());
+    repl_info->ts = ts;
     if (ts && ts->is_healthy()) {
       repl_info->state = ts->ReplicaState(tablet->id());
       if (ContainsKey(ts->tablet_status_map(), tablet->id())) {
@@ -727,6 +727,8 @@ Ksck::CheckResult Ksck::VerifyTablet(const shared_ptr<KsckTablet>&
tablet, int t
   }
   std::sort(replica_infos.begin(), replica_infos.end(),
             [](const ReplicaInfo& left, const ReplicaInfo& right) -> bool {
+              if (!left.ts) return true;
+              if (!right.ts) return false;
               return left.ts->uuid() < right.ts->uuid();
             });
 


Mime
View raw message