kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject [1/2] kudu git commit: client: throttle some warning messages when lookup RPCs fail
Date Mon, 05 Dec 2016 04:52:57 GMT
Repository: kudu
Updated Branches:
  refs/heads/master 80ac8bae3 -> 47c25ceae


client: throttle some warning messages when lookup RPCs fail

In an Impala stress workload, we found that LookupRpc::SendRpcCb was
spewing warnings at an alarming rate. The warnings themselves were
causing so much load (due to Impala's glog lock contention) that it
produced a vicious cycle: the log lock contention caused more timeouts,
which caused more warnings, which caused more lock contention, etc...

I figured out the address of the glog mutex in an impalad process and
used perf to trace the stack trace of its lock acquisitions over a
5-second period:

[todd@ve1120 kudu]$ sudo perf record -a -e syscalls:sys_enter_futex --filter 'uaddr == "0x7f6950a7714c"'
-g sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.191 MB perf.data (~183118 samples) ]
[todd@ve1120 kudu]$ sudo perf script | grep -A1 'LogMess' | grep client:: | sort | uniq -c
    272             7f6950477224 kudu::client::internal::WriteRpc::Finish(kudu::Status const&)
     92             7f695049be85 kudu::client::internal::RemoteTablet::MarkReplicaFailed(kudu::client::internal::RemoteTabletServer*,
kudu::Status const&)
  14453             7f69504a1bc5 kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status
const&)

Here we can see the vast majority came from LookupRpc::SendRpcCb and
WriteRpc::Finish. This patch addresses those as well as a few others
that look like they might potentially occur frequently.

Change-Id: I840bd57976ccefba4453667f82c7aa32756922d3
Reviewed-on: http://gerrit.cloudera.org:8080/5332
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <adar@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/f0c27b5a
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/f0c27b5a
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/f0c27b5a

Branch: refs/heads/master
Commit: f0c27b5aa0616be8dd4db30a0682eab0a6218f51
Parents: 80ac8ba
Author: Todd Lipcon <todd@apache.org>
Authored: Fri Dec 2 13:38:06 2016 -0800
Committer: Todd Lipcon <todd@apache.org>
Committed: Mon Dec 5 04:51:19 2016 +0000

----------------------------------------------------------------------
 src/kudu/client/batcher.cc         |  2 +-
 src/kudu/client/client-internal.cc | 22 +++++++++++++---------
 src/kudu/client/meta_cache.cc      | 15 ++++++++-------
 3 files changed, 22 insertions(+), 17 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/f0c27b5a/src/kudu/client/batcher.cc
----------------------------------------------------------------------
diff --git a/src/kudu/client/batcher.cc b/src/kudu/client/batcher.cc
index fb1d8a5..0c3c720 100644
--- a/src/kudu/client/batcher.cc
+++ b/src/kudu/client/batcher.cc
@@ -316,7 +316,7 @@ void WriteRpc::Finish(const Status& status) {
     final_status = final_status.CloneAndPrepend(
         Substitute("Failed to write batch of $0 ops to tablet $1 after $2 attempt(s)",
                    ops_.size(), tablet_id_, num_attempts()));
-    LOG(WARNING) << final_status.ToString();
+    KLOG_EVERY_N_SECS(WARNING, 1) << final_status.ToString();
   }
   batcher_->ProcessWriteResponse(*this, final_status);
 }

http://git-wip-us.apache.org/repos/asf/kudu/blob/f0c27b5a/src/kudu/client/client-internal.cc
----------------------------------------------------------------------
diff --git a/src/kudu/client/client-internal.cc b/src/kudu/client/client-internal.cc
index c6f7a31..e483be9 100644
--- a/src/kudu/client/client-internal.cc
+++ b/src/kudu/client/client-internal.cc
@@ -40,6 +40,7 @@
 #include "kudu/rpc/rpc.h"
 #include "kudu/rpc/rpc_controller.h"
 #include "kudu/rpc/rpc_header.pb.h"
+#include "kudu/util/logging.h"
 #include "kudu/util/net/dns_resolver.h"
 #include "kudu/util/net/net_util.h"
 #include "kudu/util/thread_restrictions.h"
@@ -185,9 +186,10 @@ Status KuduClient::Data::SyncLeaderMasterRpc(
     }
 
     if (s.IsNetworkError()) {
-      LOG(WARNING) << "Unable to send the request (" << req.ShortDebugString()
-                   << ") to leader Master (" << leader_master_hostport().ToString()
-                   << "): " << s.ToString();
+      KLOG_EVERY_N_SECS(WARNING, 1)
+          << "Unable to send the request (" << req.ShortDebugString()
+          << ") to leader Master (" << leader_master_hostport().ToString() <<
"): "
+          << s.ToString();
       if (client->IsMultiMaster()) {
         LOG(INFO) << "Determining the new leader Master and retrying...";
         WARN_NOT_OK(SetMasterServerProxy(client, deadline),
@@ -198,9 +200,10 @@ Status KuduClient::Data::SyncLeaderMasterRpc(
 
     if (s.IsTimedOut()) {
       if (MonoTime::Now() < deadline) {
-        LOG(WARNING) << "Unable to send the request (" << req.ShortDebugString()
-                     << ") to leader Master (" << leader_master_hostport().ToString()
-                     << "): " << s.ToString();
+        KLOG_EVERY_N_SECS(WARNING, 1)
+            << "Unable to send the request (" << req.ShortDebugString()
+            << ") to leader Master (" << leader_master_hostport().ToString()
+            << "): " << s.ToString();
         if (client->IsMultiMaster()) {
           LOG(INFO) << "Determining the new leader Master and retrying...";
           WARN_NOT_OK(SetMasterServerProxy(client, deadline),
@@ -218,7 +221,7 @@ Status KuduClient::Data::SyncLeaderMasterRpc(
       if (resp->error().code() == MasterErrorPB::NOT_THE_LEADER ||
           resp->error().code() == MasterErrorPB::CATALOG_MANAGER_NOT_INITIALIZED) {
         if (client->IsMultiMaster()) {
-          LOG(INFO) << "Determining the new leader Master and retrying...";
+          KLOG_EVERY_N_SECS(INFO, 1) << "Determining the new leader Master and retrying...";
           WARN_NOT_OK(SetMasterServerProxy(client, deadline),
                       "Unable to determine the new leader Master");
           continue;
@@ -649,8 +652,9 @@ void KuduClient::Data::SetMasterServerProxyAsync(KuduClient* client,
       return;
     }
     if (addrs.size() > 1) {
-      LOG(WARNING) << "Specified master server address '" << master_server_addr
<< "' "
-                   << "resolved to multiple IPs. Using " << addrs[0].ToString();
+      KLOG_EVERY_N_SECS(WARNING, 1)
+          << "Specified master server address '" << master_server_addr <<
"' "
+          << "resolved to multiple IPs. Using " << addrs[0].ToString();
     }
     master_sockaddrs.push_back(addrs[0]);
   }

http://git-wip-us.apache.org/repos/asf/kudu/blob/f0c27b5a/src/kudu/client/meta_cache.cc
----------------------------------------------------------------------
diff --git a/src/kudu/client/meta_cache.cc b/src/kudu/client/meta_cache.cc
index 82bcc5d..5de9652 100644
--- a/src/kudu/client/meta_cache.cc
+++ b/src/kudu/client/meta_cache.cc
@@ -33,6 +33,7 @@
 #include "kudu/rpc/messenger.h"
 #include "kudu/rpc/rpc.h"
 #include "kudu/tserver/tserver_service.proxy.h"
+#include "kudu/util/logging.h"
 #include "kudu/util/net/dns_resolver.h"
 #include "kudu/util/net/net_util.h"
 
@@ -193,7 +194,7 @@ void RemoteTablet::MarkReplicaFailed(RemoteTabletServer *ts,
   std::lock_guard<simple_spinlock> l(lock_);
   VLOG(2) << "Tablet " << tablet_id_ << ": Current remote replicas in meta
cache: "
           << ReplicasAsStringUnlocked();
-  LOG(WARNING) << "Tablet " << tablet_id_ << ": Replica " << ts->ToString()
+  KLOG_EVERY_N_SECS(WARNING, 1) << "Tablet " << tablet_id_ << ": Replica
" << ts->ToString()
                << " has failed: " << status.ToString();
   for (RemoteReplica& rep : replicas_) {
     if (rep.ts == ts) {
@@ -682,7 +683,7 @@ void LookupRpc::NewLeaderMasterDeterminedCb(const Status& status)
{
     mutable_retrier()->mutable_controller()->Reset();
     SendRpc();
   } else {
-    LOG(WARNING) << "Failed to determine new Master: " << status.ToString();
+    KLOG_EVERY_N_SECS(WARNING, 1) << "Failed to determine new Master: " << status.ToString();
     mutable_retrier()->DelayedRetry(this, status);
   }
 }
@@ -707,7 +708,7 @@ void LookupRpc::SendRpcCb(const Status& status) {
     if (resp_.error().code() == master::MasterErrorPB::NOT_THE_LEADER ||
         resp_.error().code() == master::MasterErrorPB::CATALOG_MANAGER_NOT_INITIALIZED) {
       if (meta_cache_->client_->IsMultiMaster()) {
-        LOG(WARNING) << "Leader Master has changed, re-trying...";
+        KLOG_EVERY_N_SECS(WARNING, 1) << "Leader Master has changed, re-trying...";
         ResetMasterLeaderAndRetry();
         ignore_result(delete_me.release());
         return;
@@ -720,7 +721,7 @@ void LookupRpc::SendRpcCb(const Status& status) {
   if (new_status.IsTimedOut()) {
     if (MonoTime::Now() < retrier().deadline()) {
       if (meta_cache_->client_->IsMultiMaster()) {
-        LOG(WARNING) << "Leader Master timed out, re-trying...";
+        KLOG_EVERY_N_SECS(WARNING, 1) << "Leader Master timed out, re-trying...";
         ResetMasterLeaderAndRetry();
         ignore_result(delete_me.release());
         return;
@@ -734,8 +735,8 @@ void LookupRpc::SendRpcCb(const Status& status) {
 
   if (new_status.IsNetworkError()) {
     if (meta_cache_->client_->IsMultiMaster()) {
-      LOG(WARNING) << "Encountered a network error from the Master: " << new_status.ToString()
-                     << ", retrying...";
+      KLOG_EVERY_N_SECS(WARNING, 1) << "Encountered a network error from the Master:
"
+                                    << new_status.ToString() << ", retrying...";
       ResetMasterLeaderAndRetry();
       ignore_result(delete_me.release());
       return;
@@ -760,7 +761,7 @@ void LookupRpc::SendRpcCb(const Status& status) {
     }
   } else {
     new_status = new_status.CloneAndPrepend(Substitute("$0 failed", ToString()));
-    LOG(WARNING) << new_status.ToString();
+    KLOG_EVERY_N_SECS(WARNING, 1) << new_status.ToString();
   }
   user_cb_.Run(new_status);
 }


Mime
View raw message