From commits-return-3986-apmail-kudu-commits-archive=kudu.apache.org@kudu.apache.org Fri Jun 23 08:55:34 2017 Return-Path: X-Original-To: apmail-kudu-commits-archive@minotaur.apache.org Delivered-To: apmail-kudu-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF8D419E60 for ; Fri, 23 Jun 2017 08:55:34 +0000 (UTC) Received: (qmail 1590 invoked by uid 500); 23 Jun 2017 08:55:34 -0000 Delivered-To: apmail-kudu-commits-archive@kudu.apache.org Received: (qmail 1555 invoked by uid 500); 23 Jun 2017 08:55:34 -0000 Mailing-List: contact commits-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kudu.apache.org Delivered-To: mailing list commits@kudu.apache.org Received: (qmail 1545 invoked by uid 99); 23 Jun 2017 08:55:34 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Jun 2017 08:55:34 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id B69B3DFA6C; Fri, 23 Jun 2017 08:55:31 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: dralves@apache.org To: commits@kudu.apache.org Message-Id: X-Mailer: ASF-Git Admin Mailer Subject: kudu git commit: Fix SIGSEGV in ksck Date: Fri, 23 Jun 2017 08:55:31 +0000 (UTC) Repository: kudu Updated Branches: refs/heads/master d62394293 -> def06d8b8 Fix SIGSEGV in ksck ksck will segfault when some tablet servers that host tablet replicas are missing. This happens, for example, if the master is still restarting and has not yet fully populated its list of live tablet servers. The root cause is that a vector of replicas is being sorted by tserver uuid obtained from the master even if the master is not aware of the tablet server was not found, causing a segmentation fault when trying to access the uuid. The fix just checks for a missing tserver reference and sorts such replicas first. The included test segfaults without the fix. Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf Reviewed-on: http://gerrit.cloudera.org:8080/7261 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/def06d8b Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/def06d8b Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/def06d8b Branch: refs/heads/master Commit: def06d8b8155d2aa5838d19f00cce940d2233ad1 Parents: d623942 Author: David Alves Authored: Thu Jun 22 17:40:04 2017 +0100 Committer: David Ribeiro Alves Committed: Fri Jun 23 08:54:41 2017 +0000 ---------------------------------------------------------------------- src/kudu/tools/ksck-test.cc | 13 +++++++++++++ src/kudu/tools/ksck.cc | 6 ++++-- 2 files changed, 17 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/def06d8b/src/kudu/tools/ksck-test.cc ---------------------------------------------------------------------- diff --git a/src/kudu/tools/ksck-test.cc b/src/kudu/tools/ksck-test.cc index b6ed5b3..496726f 100644 --- a/src/kudu/tools/ksck-test.cc +++ b/src/kudu/tools/ksck-test.cc @@ -503,5 +503,18 @@ TEST_F(KsckTest, TestTabletNotRunning) { " Last status: \n"); } +// Test for a bug where we weren't properly handling a tserver not reported by the master. +TEST_F(KsckTest, TestMissingTserver) { + CreateOneSmallReplicatedTable(); + + // Delete a tablet server from the master's list. This simulates a situation + // where the master is starting and hasn't listed all tablet servers yet, but + // tablets from other tablet servers are listing the missing tablet server as a peer. + EraseKeyReturnValuePtr(&master_->tablet_servers_, "ts-id-0"); + Status s = RunKsck(); + ASSERT_EQ("Corruption: 1 table(s) are bad", s.ToString()); + ASSERT_STR_CONTAINS(err_stream_.str(), "Table test has 3 under-replicated tablet(s)"); +} + } // namespace tools } // namespace kudu http://git-wip-us.apache.org/repos/asf/kudu/blob/def06d8b/src/kudu/tools/ksck.cc ---------------------------------------------------------------------- diff --git a/src/kudu/tools/ksck.cc b/src/kudu/tools/ksck.cc index b4fa190..f1eec40 100644 --- a/src/kudu/tools/ksck.cc +++ b/src/kudu/tools/ksck.cc @@ -661,8 +661,8 @@ Ksck::CheckResult Ksck::VerifyTablet(const shared_ptr& tablet, int t // Check for agreement on tablet assignment and state between the master // and the tablet server. - auto ts = FindPtrOrNull(cluster_->tablet_servers(), replica->ts_uuid()); - repl_info->ts = ts.get(); + auto ts = FindPointeeOrNull(cluster_->tablet_servers(), replica->ts_uuid()); + repl_info->ts = ts; if (ts && ts->is_healthy()) { repl_info->state = ts->ReplicaState(tablet->id()); if (ContainsKey(ts->tablet_status_map(), tablet->id())) { @@ -727,6 +727,8 @@ Ksck::CheckResult Ksck::VerifyTablet(const shared_ptr& tablet, int t } std::sort(replica_infos.begin(), replica_infos.end(), [](const ReplicaInfo& left, const ReplicaInfo& right) -> bool { + if (!left.ts) return true; + if (!right.ts) return false; return left.ts->uuid() < right.ts->uuid(); });