kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject [1/2] kudu git commit: cfile_set: reduce memory usage of reader map
Date Fri, 24 Mar 2017 19:04:03 GMT
Repository: kudu
Updated Branches:
  refs/heads/master 47520a7e8 -> cf25ec5c4


cfile_set: reduce memory usage of reader map

We previously used an unordered_map<int, shared_ptr<...>>. This patch
changes to using boost's flat_map container, which is based on sorted
entries in a vector. This means we get one contiguous memory location vs
a bunch of smaller per-item allocations in the separate-chained
hashtable. Additionally switches to unique_ptr which avoids an extra
16-byte control structure per CFileReader.

It's hard to measure the exact win here, but I wrote a little test
program which compared inserting 40 entries into the two types of maps,
leaking the maps, and looking at the LeakSanitizer reports. With the old
map and shared_ptr, it cost 2832 bytes, and with the flat_map and
unique_ptr, it only took 824 bytes. So, we saved about 50 bytes per
entry.

Extrapolating this out to a server with 600k cfiles open, we'll save
around 28MB. Not huge, but worth doing.

Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
Reviewed-on: http://gerrit.cloudera.org:8080/6466
Reviewed-by: Adar Dembo <adar@cloudera.com>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/02cbf883
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/02cbf883
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/02cbf883

Branch: refs/heads/master
Commit: 02cbf8835d72157c348d8b4c1e34611b0bc05f44
Parents: 47520a7
Author: Todd Lipcon <todd@apache.org>
Authored: Thu Mar 23 18:26:35 2017 -0700
Committer: Todd Lipcon <todd@apache.org>
Committed: Fri Mar 24 18:46:12 2017 +0000

----------------------------------------------------------------------
 src/kudu/tablet/cfile_set.cc | 9 +++++----
 src/kudu/tablet/cfile_set.h  | 6 ++++--
 2 files changed, 9 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/02cbf883/src/kudu/tablet/cfile_set.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/cfile_set.cc b/src/kudu/tablet/cfile_set.cc
index a98b918..9b70c92 100644
--- a/src/kudu/tablet/cfile_set.cc
+++ b/src/kudu/tablet/cfile_set.cc
@@ -45,6 +45,7 @@ using cfile::ReaderOptions;
 using cfile::DefaultColumnValueIterator;
 using fs::ReadableBlock;
 using std::shared_ptr;
+using std::unique_ptr;
 using strings::Substitute;
 
 ////////////////////////////////////////////////////////////
@@ -104,10 +105,11 @@ Status CFileSet::DoOpen() {
                              parent_mem_tracker_,
                              rowset_metadata_->column_data_block_for_col_id(col_id),
                              &reader));
-    readers_by_col_id_[col_id] = shared_ptr<CFileReader>(reader.release());
+    readers_by_col_id_[col_id] = unique_ptr<CFileReader>(reader.release());
     VLOG(1) << "Successfully opened cfile for column id " << col_id
             << " in " << rowset_metadata_->ToString();
   }
+  readers_by_col_id_.shrink_to_fit();
 
   if (rowset_metadata_->has_adhoc_index_block()) {
     RETURN_NOT_OK(OpenReader(rowset_metadata_->fs_manager(),
@@ -196,9 +198,8 @@ Status CFileSet::GetBounds(string* min_encoded_key,
 
 uint64_t CFileSet::EstimateOnDiskSize() const {
   uint64_t ret = 0;
-  for (const ReaderMap::value_type& e : readers_by_col_id_) {
-    const shared_ptr<CFileReader> &reader = e.second;
-    ret += reader->file_size();
+  for (const auto& e : readers_by_col_id_) {
+    ret += e.second->file_size();
   }
   return ret;
 }

http://git-wip-us.apache.org/repos/asf/kudu/blob/02cbf883/src/kudu/tablet/cfile_set.h
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/cfile_set.h b/src/kudu/tablet/cfile_set.h
index 288d105..7cfcfa6 100644
--- a/src/kudu/tablet/cfile_set.h
+++ b/src/kudu/tablet/cfile_set.h
@@ -17,10 +17,10 @@
 #ifndef KUDU_TABLET_LAYER_BASEDATA_H
 #define KUDU_TABLET_LAYER_BASEDATA_H
 
+#include <boost/container/flat_map.hpp>
 #include <gtest/gtest_prod.h>
 #include <memory>
 #include <string>
-#include <unordered_map>
 #include <vector>
 
 #include "kudu/cfile/bloomfile.h"
@@ -119,7 +119,9 @@ class CFileSet : public std::enable_shared_from_this<CFileSet> {
   std::string max_encoded_key_;
 
   // Map of column ID to reader. These are lazily initialized as needed.
-  typedef std::unordered_map<int, std::shared_ptr<CFileReader> > ReaderMap;
+  // We use flat_map here since it's the most memory-compact while
+  // still having good performance for small maps.
+  typedef boost::container::flat_map<int, std::unique_ptr<CFileReader>> ReaderMap;
   ReaderMap readers_by_col_id_;
 
   // A file reader for an ad-hoc index, i.e. an index that sits in its own file


Mime
View raw message