kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject kudu git commit: process_memory: incrementally track total process consumption
Date Thu, 04 May 2017 02:33:11 GMT
Repository: kudu
Updated Branches:
  refs/heads/master 579cb0b14 -> b15ea202d


process_memory: incrementally track total process consumption

This switches CurrentConsumption() from calling into TCMalloc to instead
using its own tracking, based on installing new/delete hooks into
tcmalloc. The hooks incrementally update a global LongAdder for
consumption, and we just query that object when we need to know the
current value.

This is meant to address a performance regression seen on YCSB where the
tablet server spends a lot of time in tcmalloc::ThreadCache::GetThreadStats
trying to compute memory usage.

I wrote a small micro-benchmark which starts 200 threads which allocate
memory and call CurrentConsumption(). Without the patch, it only
performed about 650K iters/sec. With the patch, it performs around 5M
iters/sec. With no tracking at all (no hooks and no consumption calls)
the performance is around 25M iterations/second. So, we are definitely
paying some cost for the hooks, but this should at least be an
improvement. If it doesn't win back all the performance, we can look for
an even better approach.

On the side, this changes process_memory to use GoogleOnce instead of
std::once. This sped up the microbenchmark significantly (almost 2x),
apparently due to better inlining.

Change-Id: I3293110b7eda7f083bf400e165be3d782c441c49
Reviewed-on: http://gerrit.cloudera.org:8080/6793
Reviewed-by: Adar Dembo <adar@cloudera.com>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/b15ea202
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/b15ea202
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/b15ea202

Branch: refs/heads/master
Commit: b15ea202d6a2be427614a876aa56d03b9c8b3a1b
Parents: 579cb0b
Author: Todd Lipcon <todd@apache.org>
Authored: Wed May 3 17:58:09 2017 -0700
Committer: Todd Lipcon <todd@apache.org>
Committed: Thu May 4 02:32:53 2017 +0000

----------------------------------------------------------------------
 src/kudu/tserver/tablet_server-stress-test.cc | 11 +++
 src/kudu/util/CMakeLists.txt                  |  1 +
 src/kudu/util/process_memory-test.cc          | 71 +++++++++++++++++
 src/kudu/util/process_memory.cc               | 88 +++++++++++++++-------
 src/kudu/util/process_memory.h                |  8 ++
 5 files changed, 152 insertions(+), 27 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/b15ea202/src/kudu/tserver/tablet_server-stress-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tserver/tablet_server-stress-test.cc b/src/kudu/tserver/tablet_server-stress-test.cc
index b303cdc..8c4059f 100644
--- a/src/kudu/tserver/tablet_server-stress-test.cc
+++ b/src/kudu/tserver/tablet_server-stress-test.cc
@@ -20,6 +20,7 @@
 
 #include "kudu/gutil/strings/substitute.h"
 #include "kudu/util/countdown_latch.h"
+#include "kudu/util/process_memory.h"
 #include "kudu/util/stopwatch.h"
 
 DEFINE_int32(runtime_secs, 10,
@@ -138,6 +139,16 @@ TEST_F(TSStressTest, TestMTInserts) {
   // Ensure the timeout thread is stopped before exiting.
   stop_latch_.CountDown();
   if (timeout_thread.joinable()) timeout_thread.join();
+
+#ifdef TCMALLOC_ENABLED
+  // In TCMalloc-enabled builds, verify that our incremental memory tracking matches the
+  // actual memory consumed, within half a percent.
+  int64_t consumption = process_memory::CurrentConsumption();
+  LOG(INFO) << "consumption: " << consumption;
+  ASSERT_NEAR(process_memory::GetTCMallocCurrentAllocatedBytes(),
+              consumption,
+              consumption * 0.005);
+#endif
 }
 
 } // namespace tserver

http://git-wip-us.apache.org/repos/asf/kudu/blob/b15ea202/src/kudu/util/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/src/kudu/util/CMakeLists.txt b/src/kudu/util/CMakeLists.txt
index 04fe5e8..c8e0750 100644
--- a/src/kudu/util/CMakeLists.txt
+++ b/src/kudu/util/CMakeLists.txt
@@ -367,6 +367,7 @@ ADD_KUDU_TEST(oid_generator-test)
 ADD_KUDU_TEST(once-test)
 ADD_KUDU_TEST(os-util-test)
 ADD_KUDU_TEST(path_util-test)
+ADD_KUDU_TEST(process_memory-test RUN_SERIAL true)
 ADD_KUDU_TEST(random-test)
 ADD_KUDU_TEST(random_util-test)
 ADD_KUDU_TEST(resettable_heartbeater-test)

http://git-wip-us.apache.org/repos/asf/kudu/blob/b15ea202/src/kudu/util/process_memory-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/util/process_memory-test.cc b/src/kudu/util/process_memory-test.cc
new file mode 100644
index 0000000..38922f5
--- /dev/null
+++ b/src/kudu/util/process_memory-test.cc
@@ -0,0 +1,71 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <atomic>
+#include <thread>
+#include <vector>
+
+#include "kudu/util/monotime.h"
+#include "kudu/util/process_memory.h"
+#include "kudu/util/test_util.h"
+
+using std::atomic;
+using std::thread;
+using std::vector;
+
+namespace kudu {
+
+// Microbenchmark for our new/delete hooks which track process-wide
+// memory consumption.
+TEST(ProcessMemory, BenchmarkConsumptionTracking) {
+  const int kNumThreads = 200;
+  vector<thread> threads;
+  atomic<bool> done(false);
+  atomic<int64_t> total_count(0);
+
+  // We start many threads, each of which performs 10:1 ratio of
+  // new/delete pairs to consumption lookups. The high number
+  // of threads highlights when there is contention on central
+  // tcmalloc locks.
+  for (int i = 0; i < kNumThreads; i++) {
+    threads.emplace_back([&]() {
+        int64_t local_count = 0;
+        while (!done) {
+          for (int a = 0; a < 10; a++) {
+            // Mark 'x' volatile so that the compiler does not optimize out the
+            // allocation.
+            char* volatile x = new char[8000];
+            delete[] x;
+          }
+          process_memory::CurrentConsumption();
+          local_count++;
+        }
+        total_count += local_count;
+      });
+  }
+  double secs = 3;
+  SleepFor(MonoDelta::FromSeconds(secs));
+  done = true;
+
+  for (auto& t : threads) {
+    t.join();
+  }
+
+  LOG(INFO) << "Performed " << total_count / secs << " iters/sec";
+}
+
+} // namespace kudu

http://git-wip-us.apache.org/repos/asf/kudu/blob/b15ea202/src/kudu/util/process_memory.cc
----------------------------------------------------------------------
diff --git a/src/kudu/util/process_memory.cc b/src/kudu/util/process_memory.cc
index d3868c2..c4fb8d7 100644
--- a/src/kudu/util/process_memory.cc
+++ b/src/kudu/util/process_memory.cc
@@ -19,7 +19,9 @@
 
 #include <gflags/gflags.h>
 #include <gperftools/malloc_extension.h>
+#include <gperftools/malloc_hook.h>
 
+#include "kudu/gutil/once.h"
 #include "kudu/gutil/strings/substitute.h"
 #include "kudu/util/debug/trace_event.h"
 #include "kudu/util/env.h"
@@ -27,6 +29,7 @@
 #include "kudu/util/mem_tracker.h"
 #include "kudu/util/process_memory.h"
 #include "kudu/util/random.h"
+#include "kudu/util/striped64.h"
 
 DEFINE_int64(memory_limit_hard_bytes, 0,
              "Maximum amount of memory this daemon should use, in bytes. "
@@ -77,6 +80,10 @@ Atomic64 g_released_memory_since_gc;
 // TODO(todd): this is a stopgap.
 const int64_t GC_RELEASE_SIZE = 128 * 1024L * 1024L;
 
+// The total memory consumption of the process, incrementally tracked by
+// New/Delete hooks.
+LongAdder* g_total_consumption = nullptr;
+
 #endif // TCMALLOC_ENABLED
 
 } // anonymous namespace
@@ -114,7 +121,7 @@ static int64_t GetTCMallocProperty(const char* prop) {
   return value;
 }
 
-static int64_t GetTCMallocCurrentAllocatedBytes() {
+int64_t GetTCMallocCurrentAllocatedBytes() {
   return GetTCMallocProperty("generic.current_allocated_bytes");
 }
 
@@ -145,38 +152,65 @@ void GcTcmalloc() {
 // Consumption and soft memory limit behavior
 // ------------------------------------------------------------
 namespace {
+void DoInitLimits() {
+  int64_t limit = FLAGS_memory_limit_hard_bytes;
+  if (limit == 0) {
+    // If no limit is provided, we'll use 80% of system RAM.
+    int64_t total_ram;
+    CHECK_OK(Env::Default()->GetTotalRAMBytes(&total_ram));
+    limit = total_ram * 4;
+    limit /= 5;
+  }
+  g_hard_limit = limit;
+  g_soft_limit = FLAGS_memory_limit_soft_percentage * g_hard_limit / 100;
+
+  g_rand = new ThreadSafeRandom(1);
+
+  LOG(INFO) << StringPrintf("Process hard memory limit is %.6f GB",
+                            (static_cast<float>(g_hard_limit) / (1024.0 * 1024.0 *
1024.0)));
+  LOG(INFO) << StringPrintf("Process soft memory limit is %.6f GB",
+                            (static_cast<float>(g_soft_limit) /
+                             (1024.0 * 1024.0 * 1024.0)));
+}
+
 void InitLimits() {
-  static std::once_flag once;
-  std::call_once(once, [&]() {
-      int64_t limit = FLAGS_memory_limit_hard_bytes;
-      if (limit == 0) {
-        // If no limit is provided, we'll use 80% of system RAM.
-        int64_t total_ram;
-        CHECK_OK(Env::Default()->GetTotalRAMBytes(&total_ram));
-        limit = total_ram * 4;
-        limit /= 5;
-      }
-      g_hard_limit = limit;
-      g_soft_limit = FLAGS_memory_limit_soft_percentage * g_hard_limit / 100;
-
-      g_rand = new ThreadSafeRandom(1);
-
-      LOG(INFO) << StringPrintf("Process hard memory limit is %.6f GB",
-                                (static_cast<float>(g_hard_limit) / (1024.0 * 1024.0
* 1024.0)));
-      LOG(INFO) << StringPrintf("Process soft memory limit is %.6f GB",
-                                (static_cast<float>(g_soft_limit) /
-                                 (1024.0 * 1024.0 * 1024.0)));
-    });
+  static GoogleOnceType once;
+  GoogleOnceInit(&once, &DoInitLimits);
 }
+
+#ifdef TCMALLOC_ENABLED
+
+void NewHook(const void* ptr, size_t size) {
+  // We ignore 'size' because it's possible the allocated size is actually
+  // rounded up to the next biggest slab class.
+  ssize_t alloc_size = MallocExtension::instance()->GetAllocatedSize(ptr);
+  g_total_consumption->IncrementBy(alloc_size);
+}
+void DeleteHook(const void* ptr) {
+  ssize_t size = MallocExtension::instance()->GetAllocatedSize(ptr);
+  g_total_consumption->IncrementBy(-size);
+}
+
+void InitConsumptionHook() {
+  CHECK(!g_total_consumption);
+  g_total_consumption = new LongAdder();
+  g_total_consumption->IncrementBy(GetTCMallocCurrentAllocatedBytes());
+  MallocHook::AddNewHook(&NewHook);
+  MallocHook::AddDeleteHook(&DeleteHook);
+}
+#endif // TCMALLOC_ENABLED
+
 } // anonymous namespace
 
 int64_t CurrentConsumption() {
-  // TODO(todd): this is slow to call frequently, since it takes the tcmalloc
-  // global lock. We should look into whether it's faster to hook malloc/free
-  // and update a LongAdder instead, or otherwise make this more incrementally
-  // tracked.
 #ifdef TCMALLOC_ENABLED
-  return GetTCMallocCurrentAllocatedBytes();
+  // Calling GetTCMallocCurrentAllocatedBytes() is too slow to do frequently, since it has
to
+  // take the tcmalloc global lock and traverse all live threads to add up their caches.
+  // Instead, we install a new/delete hook to incrementally track allocated memory, and
+  // then just return the currently tracked value here.
+  static GoogleOnceType once;
+  GoogleOnceInit(&once, &InitConsumptionHook);
+  return g_total_consumption->Value();
 #else
   // Without tcmalloc, we have no reliable way of determining our own heap
   // size (e.g. mallinfo doesn't work in ASAN builds). So, we'll fall back

http://git-wip-us.apache.org/repos/asf/kudu/blob/b15ea202/src/kudu/util/process_memory.h
----------------------------------------------------------------------
diff --git a/src/kudu/util/process_memory.h b/src/kudu/util/process_memory.h
index 2868bf5..7b09f8b 100644
--- a/src/kudu/util/process_memory.h
+++ b/src/kudu/util/process_memory.h
@@ -37,5 +37,13 @@ int64_t CurrentConsumption();
 // Return the configured hard limit for the process.
 int64_t HardLimit();
 
+#ifdef TCMALLOC_ENABLED
+// Get the current amount of allocated memory, according to tcmalloc.
+//
+// This should be equal to CurrentConsumption(), but is made available so that tests
+// can verify the correctness of CurrentConsumption().
+int64_t GetTCMallocCurrentAllocatedBytes();
+#endif
+
 } // namespace process_memory
 } // namespace kudu


Mime
View raw message