kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject [3/5] kudu git commit: Temporary workaround for KUDU-1959 (race when selecting rowsets)
Date Thu, 08 Jun 2017 22:56:18 GMT
Temporary workaround for KUDU-1959 (race when selecting rowsets)

As described in the JIRA, there is a race by which multiple MM threads
can race to pick the same rowsets for compaction. Rather than crash when
hitting this bug, it is safe to simply abort that compaction attempt.
The MM will warn about the compaction failure and try again.

This is a temporary workround for the 1.4 release since the issue was
recently reported in the wild on the user list.

Change-Id: I9db313849176e1bf05636d969fafb1682e6d78de
Reviewed-on: http://gerrit.cloudera.org:8080/7120
Reviewed-by: Adar Dembo <adar@cloudera.com>
Tested-by: Kudu Jenkins

Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/8be2a591
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/8be2a591
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/8be2a591

Branch: refs/heads/master
Commit: 8be2a59103da46472062f47f89efa6e1bddd0a5c
Parents: 693f675
Author: Todd Lipcon <todd@apache.org>
Authored: Thu Jun 8 14:07:52 2017 -0700
Committer: Todd Lipcon <todd@apache.org>
Committed: Thu Jun 8 22:04:19 2017 +0000

 src/kudu/tablet/tablet.cc | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/kudu/tablet/tablet.cc b/src/kudu/tablet/tablet.cc
index aaaa72b..fb6043b 100644
--- a/src/kudu/tablet/tablet.cc
+++ b/src/kudu/tablet/tablet.cc
@@ -1219,7 +1219,13 @@ Status Tablet::PickRowSetsToCompact(RowSetsInCompaction *picked,
       LOG_WITH_PREFIX(ERROR) << "Rowset selected for compaction but not available anymore:
                              << not_found->ToString();
-    LOG_WITH_PREFIX(FATAL) << "Was unable to find all rowsets selected for compaction";
+    // TODO(todd): this should never happen, but KUDU-1959 is a bug which causes us to
+    // sometimes concurrently decide to compact the same rowsets. It should be harmless
+    // to simply abort the compaction when we hit this bug, though long term we should
+    // fix the underlying race.
+    const char* msg = "Was unable to find all rowsets selected for compaction";
+    return Status::RuntimeError(msg);
   return Status::OK();

View raw message