kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject [3/3] kudu git commit: KUDU-2193 (part 2): avoid holding TSTabletManager::lock_ for a long time
Date Wed, 01 Nov 2017 05:00:20 GMT
KUDU-2193 (part 2): avoid holding TSTabletManager::lock_ for a long time

This changes tablet report generation to only hold the TSTabletManager lock
long enough to copy a list of refs to the relevant tablets.

Even though the lock is a reader-writer lock, a long read-side critical
section can end up blocking other readers as long as any writer shows up.
I saw the following in a cluster:

- election storm ongoing
- T1 generating a tablet report (thus holding the reader lock)
  - blocks for a long time getting ConsensusState from some tablets currently
    mid-fsync.
- T2 handling a DeleteTablet or CreateTablet call (waiting for writer lock)
- T3 through T20: blocking on reader lock in LookupTablet()

The effect here is that all other threads end up blocked until T1 finishes
its tablet report generation, which in this case can be tens of seconds due
to all of the fsync traffic. These blocked threads then contribute to the
ongoing election storm since they may delay timely responses to vote requests
from other tablets.

I tested this and the previous patch on a cluster that was previously
experiencing the issue. I triggered some elections by kill -STOPping some
servers and resuming them a few seconds later. Without these patches,
other servers started doing lots of context switches (due to the spinlock
used prior to this patch series) and I saw lots of blocked threads in
pstack. With the patch, things seem cleaner.

The following screenshot shows before/after voluntary_context_switch rate:

https://i.imgur.com/7zcbImw.png

(the missing data around 5pm is my restarting the servers with this patch)

Change-Id: I0f645775e8347a18af112b308dba56a3b4a2c681
Reviewed-on: http://gerrit.cloudera.org:8080/8346
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <todd@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/f2c21b4f
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/f2c21b4f
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/f2c21b4f

Branch: refs/heads/master
Commit: f2c21b4fac15fcaa5f53a6c7960ecd19a0664c45
Parents: 94bf699
Author: Todd Lipcon <todd@apache.org>
Authored: Thu Oct 19 17:29:41 2017 -0700
Committer: Todd Lipcon <todd@apache.org>
Committed: Wed Nov 1 04:44:02 2017 +0000

----------------------------------------------------------------------
 src/kudu/tserver/ts_tablet_manager.cc | 46 +++++++++++++++++++-----------
 src/kudu/tserver/ts_tablet_manager.h  |  3 +-
 2 files changed, 31 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/f2c21b4f/src/kudu/tserver/ts_tablet_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tserver/ts_tablet_manager.cc b/src/kudu/tserver/ts_tablet_manager.cc
index 5ff61d5..47eae78 100644
--- a/src/kudu/tserver/ts_tablet_manager.cc
+++ b/src/kudu/tserver/ts_tablet_manager.cc
@@ -1131,10 +1131,9 @@ void TSTabletManager::InitLocalRaftPeerPB() {
   CHECK_OK(HostPortToPB(hp, local_peer_pb_.mutable_last_known_addr()));
 }
 
-void TSTabletManager::CreateReportedTabletPB(const string& tablet_id,
-                                             const scoped_refptr<TabletReplica>&
replica,
+void TSTabletManager::CreateReportedTabletPB(const scoped_refptr<TabletReplica>&
replica,
                                              ReportedTabletPB* reported_tablet) const {
-  reported_tablet->set_tablet_id(tablet_id);
+  reported_tablet->set_tablet_id(replica->tablet_id());
   reported_tablet->set_state(replica->state());
   reported_tablet->set_tablet_data_state(replica->tablet_metadata()->tablet_data_state());
   const Status& error = replica->error();
@@ -1151,26 +1150,41 @@ void TSTabletManager::CreateReportedTabletPB(const string& tablet_id,
 }
 
 void TSTabletManager::PopulateFullTabletReport(TabletReportPB* report) const {
-  shared_lock<RWMutex> shared_lock(lock_);
-  for (const auto& e : tablet_map_) {
-    CreateReportedTabletPB(e.first, e.second, report->add_updated_tablets());
+  // Creating the tablet report can be slow in the case that it is in the
+  // middle of flushing its consensus metadata. We don't want to hold
+  // lock_ for too long, even in read mode, since it can cause other readers
+  // to block if there is a waiting writer (see KUDU-2193). So, we just make
+  // a local copy of the set of replicas.
+  vector<scoped_refptr<tablet::TabletReplica>> to_report;
+  GetTabletReplicas(&to_report);
+  for (const auto& replica : to_report) {
+    CreateReportedTabletPB(replica, report->add_updated_tablets());
   }
 }
 
 void TSTabletManager::PopulateIncrementalTabletReport(TabletReportPB* report,
                                                       const vector<string>& tablet_ids)
const {
-  shared_lock<RWMutex> shared_lock(lock_);
-  for (const auto& id : tablet_ids) {
-    const scoped_refptr<tablet::TabletReplica>* replica =
-        FindOrNull(tablet_map_, id);
-    if (replica) {
-      // Dirty entry, report on it.
-      CreateReportedTabletPB(id, *replica, report->add_updated_tablets());
-    } else {
-      // Removed.
-      report->add_removed_tablet_ids(id);
+  // See comment in PopulateFullTabletReport for rationale on making a local
+  // copy of the set of tablets to report.
+  vector<scoped_refptr<tablet::TabletReplica>> to_report;
+  to_report.reserve(tablet_ids.size());
+  {
+    shared_lock<RWMutex> shared_lock(lock_);
+    for (const auto& id : tablet_ids) {
+      const scoped_refptr<tablet::TabletReplica>* replica =
+          FindOrNull(tablet_map_, id);
+      if (replica) {
+        // Dirty entry, report on it.
+        to_report.push_back(*replica);
+      } else {
+        // Removed.
+        report->add_removed_tablet_ids(id);
+      }
     }
   }
+  for (const auto& replica : to_report) {
+    CreateReportedTabletPB(replica, report->add_updated_tablets());
+  }
 }
 
 Status TSTabletManager::HandleNonReadyTabletOnStartup(const scoped_refptr<TabletMetadata>&
meta) {

http://git-wip-us.apache.org/repos/asf/kudu/blob/f2c21b4f/src/kudu/tserver/ts_tablet_manager.h
----------------------------------------------------------------------
diff --git a/src/kudu/tserver/ts_tablet_manager.h b/src/kudu/tserver/ts_tablet_manager.h
index 6c9cd28..317ab9f 100644
--- a/src/kudu/tserver/ts_tablet_manager.h
+++ b/src/kudu/tserver/ts_tablet_manager.h
@@ -278,8 +278,7 @@ class TSTabletManager : public tserver::TabletReplicaLookupIf {
                                         scoped_refptr<tablet::TabletReplica>* replica_out);
 
   // Helper to generate the report for a single tablet.
-  void CreateReportedTabletPB(const std::string& tablet_id,
-                              const scoped_refptr<tablet::TabletReplica>& replica,
+  void CreateReportedTabletPB(const scoped_refptr<tablet::TabletReplica>& replica,
                               master::ReportedTabletPB* reported_tablet) const;
 
   // Handle the case on startup where we find a tablet that is not in


Mime
View raw message