kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dral...@apache.org
Subject kudu git commit: Fix SIGSEGV in ksck
Date Fri, 23 Jun 2017 08:55:31 GMT
Repository: kudu
Updated Branches:
  refs/heads/master d62394293 -> def06d8b8


Fix SIGSEGV in ksck

ksck will segfault when some tablet servers that host tablet
replicas are missing. This happens, for example, if the master
is still restarting and has not yet fully populated its list of
live tablet servers.

The root cause is that a vector of replicas is being sorted by
tserver uuid obtained from the master even if the master is not
aware of the tablet server was not found, causing a segmentation
fault when trying to access the uuid. The fix just checks for a
missing tserver reference and sorts such replicas first.

The included test segfaults without the fix.

Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
Reviewed-on: http://gerrit.cloudera.org:8080/7261
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <davidralves@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/def06d8b
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/def06d8b
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/def06d8b

Branch: refs/heads/master
Commit: def06d8b8155d2aa5838d19f00cce940d2233ad1
Parents: d623942
Author: David Alves <dralves@apache.org>
Authored: Thu Jun 22 17:40:04 2017 +0100
Committer: David Ribeiro Alves <davidralves@gmail.com>
Committed: Fri Jun 23 08:54:41 2017 +0000

----------------------------------------------------------------------
 src/kudu/tools/ksck-test.cc | 13 +++++++++++++
 src/kudu/tools/ksck.cc      |  6 ++++--
 2 files changed, 17 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/def06d8b/src/kudu/tools/ksck-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/ksck-test.cc b/src/kudu/tools/ksck-test.cc
index b6ed5b3..496726f 100644
--- a/src/kudu/tools/ksck-test.cc
+++ b/src/kudu/tools/ksck-test.cc
@@ -503,5 +503,18 @@ TEST_F(KsckTest, TestTabletNotRunning) {
       "    Last status: \n");
 }
 
+// Test for a bug where we weren't properly handling a tserver not reported by the master.
+TEST_F(KsckTest, TestMissingTserver) {
+  CreateOneSmallReplicatedTable();
+
+  // Delete a tablet server from the master's list. This simulates a situation
+  // where the master is starting and hasn't listed all tablet servers yet, but
+  // tablets from other tablet servers are listing the missing tablet server as a peer.
+  EraseKeyReturnValuePtr(&master_->tablet_servers_, "ts-id-0");
+  Status s = RunKsck();
+  ASSERT_EQ("Corruption: 1 table(s) are bad", s.ToString());
+  ASSERT_STR_CONTAINS(err_stream_.str(), "Table test has 3 under-replicated tablet(s)");
+}
+
 } // namespace tools
 } // namespace kudu

http://git-wip-us.apache.org/repos/asf/kudu/blob/def06d8b/src/kudu/tools/ksck.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/ksck.cc b/src/kudu/tools/ksck.cc
index b4fa190..f1eec40 100644
--- a/src/kudu/tools/ksck.cc
+++ b/src/kudu/tools/ksck.cc
@@ -661,8 +661,8 @@ Ksck::CheckResult Ksck::VerifyTablet(const shared_ptr<KsckTablet>&
tablet, int t
 
     // Check for agreement on tablet assignment and state between the master
     // and the tablet server.
-    auto ts = FindPtrOrNull(cluster_->tablet_servers(), replica->ts_uuid());
-    repl_info->ts = ts.get();
+    auto ts = FindPointeeOrNull(cluster_->tablet_servers(), replica->ts_uuid());
+    repl_info->ts = ts;
     if (ts && ts->is_healthy()) {
       repl_info->state = ts->ReplicaState(tablet->id());
       if (ContainsKey(ts->tablet_status_map(), tablet->id())) {
@@ -727,6 +727,8 @@ Ksck::CheckResult Ksck::VerifyTablet(const shared_ptr<KsckTablet>&
tablet, int t
   }
   std::sort(replica_infos.begin(), replica_infos.end(),
             [](const ReplicaInfo& left, const ReplicaInfo& right) -> bool {
+              if (!left.ts) return true;
+              if (!right.ts) return false;
               return left.ts->uuid() < right.ts->uuid();
             });
 


Mime
View raw message