kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a...@apache.org
Subject [1/2] kudu git commit: [c++ client] AUTO_FLUSH_BACKGROUND optimizations
Date Fri, 16 Sep 2016 01:35:04 GMT
Repository: kudu
Updated Branches:
  refs/heads/master af7f9c805 -> 1610b4ac4


[c++ client] AUTO_FLUSH_BACKGROUND optimizations

Optimizations after initial performance testing of the
Kudu C++ client library with AUTO_FLUSH_BACKGROUND flush mode support.

The most important tuning is the default flush watermark
for the mutation buffer.  Changing it from 80% to 50% gave near 30%
performance boost in throughput for scenarios when a client pushes
data to the server as fast as it can, using workloads of 8M rows like
(int64, int32, string, string, int) where strings are about 32 bytes
long in average.  Each thread ran its own single-session KuduClient,
where each session was running in AUTO_FLUSH_BACKGROUND flush mode.

1-thread insertion (8M rows per thread)
  80% watermark:
    total  : 35229.7 ms
    per row: 0.00440372 ms

  50% watermark:
    total  : 22562.8 ms
    per row: 0.00282035 ms

2-thread insertion (4M rows per thread)
  80% watermark:
    total  : 19683.6 ms
    per row: 0.00246046 ms

  50% watermark:
    total  : 12931.8 ms
    per row: 0.00161647 ms

4-thread insertion (2M rows per thread)
  80% watermark:
    total  : 11941.9 ms
    per row: 0.00149274 ms

  50% watermark:
    total  : 7724.68 ms
    per row: 0.000965585 ms

Other related session parameters:
  mutation buffer size:       7M       (default)
  maximum number of batchers: 2        (default)
  time-based flush interval:  1 second (default)

The tests were run dual Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
(12 cores per CPU) with 98GiB of memory.

This is a follow-up for 93be1310d227cf05025864654ca3f6713c2ddc2c.

Change-Id: I1f0aa6d02c51bb063498709e8570e8c7214a31a0
Reviewed-on: http://gerrit.cloudera.org:8080/4308
Tested-by: Kudu Jenkins
Tested-by: Alexey Serbin <aserbin@cloudera.com>
Reviewed-by: David Ribeiro Alves <dralves@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/7959cd40
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/7959cd40
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/7959cd40

Branch: refs/heads/master
Commit: 7959cd403c13d1f40c0b8e90ad66a6940e510ba5
Parents: af7f9c8
Author: Alexey Serbin <aserbin@cloudera.com>
Authored: Fri Sep 2 19:39:29 2016 -0700
Committer: Alexey Serbin <aserbin@cloudera.com>
Committed: Fri Sep 16 01:05:24 2016 +0000

----------------------------------------------------------------------
 src/kudu/client/batcher.cc          | 4 ----
 src/kudu/client/batcher.h           | 4 +++-
 src/kudu/client/client.h            | 2 +-
 src/kudu/client/session-internal.cc | 8 ++++----
 src/kudu/client/write_op.cc         | 5 ++---
 5 files changed, 10 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/7959cd40/src/kudu/client/batcher.cc
----------------------------------------------------------------------
diff --git a/src/kudu/client/batcher.cc b/src/kudu/client/batcher.cc
index f80ca8a..de6cfdc 100644
--- a/src/kudu/client/batcher.cc
+++ b/src/kudu/client/batcher.cc
@@ -510,10 +510,6 @@ void Batcher::FlushAsync(KuduStatusCallback* cb) {
   FlushBuffersIfReady();
 }
 
-int64_t Batcher::GetOperationSizeInBuffer(KuduWriteOperation* write_op) {
-  return write_op->SizeInBuffer();
-}
-
 Status Batcher::Add(KuduWriteOperation* write_op) {
   // As soon as we get the op, start looking up where it belongs,
   // so that when the user calls Flush, we are ready to go.

http://git-wip-us.apache.org/repos/asf/kudu/blob/7959cd40/src/kudu/client/batcher.h
----------------------------------------------------------------------
diff --git a/src/kudu/client/batcher.h b/src/kudu/client/batcher.h
index 9851340..566d356 100644
--- a/src/kudu/client/batcher.h
+++ b/src/kudu/client/batcher.h
@@ -124,7 +124,9 @@ class Batcher : public RefCountedThreadSafe<Batcher> {
   }
 
   // Compute in-buffer size for the given write operation.
-  static int64_t GetOperationSizeInBuffer(KuduWriteOperation* write_op);
+  static int64_t GetOperationSizeInBuffer(KuduWriteOperation* write_op) {
+    return write_op->SizeInBuffer();
+  }
 
  private:
   friend class RefCountedThreadSafe<Batcher>;

http://git-wip-us.apache.org/repos/asf/kudu/blob/7959cd40/src/kudu/client/client.h
----------------------------------------------------------------------
diff --git a/src/kudu/client/client.h b/src/kudu/client/client.h
index aa9c92c..5a3f7b7 100644
--- a/src/kudu/client/client.h
+++ b/src/kudu/client/client.h
@@ -1244,7 +1244,7 @@ class KUDU_EXPORT KuduSession : public sp::enable_shared_from_this<KuduSession>
   /// when running in AUTO_FLUSH_BACKGROUND mode: once the specified threshold
   /// is reached, the session starts sending the accumulated write operations
   /// to the appropriate tablet servers. By default, the buffer flush watermark
-  /// is to to 80%.
+  /// is to to 50%.
   ///
   /// @note This setting is applicable only for AUTO_FLUSH_BACKGROUND sessions.
   ///   I.e., calling this method in other flush modes is safe, but

http://git-wip-us.apache.org/repos/asf/kudu/blob/7959cd40/src/kudu/client/session-internal.cc
----------------------------------------------------------------------
diff --git a/src/kudu/client/session-internal.cc b/src/kudu/client/session-internal.cc
index 661288d..6fed4e0 100644
--- a/src/kudu/client/session-internal.cc
+++ b/src/kudu/client/session-internal.cc
@@ -53,7 +53,7 @@ KuduSession::Data::Data(shared_ptr<KuduClient> client,
       batchers_num_(0),
       batchers_num_limit_(2),
       buffer_bytes_limit_(7 * 1024 * 1024),
-      buffer_watermark_pct_(80),
+      buffer_watermark_pct_(50),
       buffer_bytes_used_(0),
       buffer_pre_flush_enabled_(true) {
 }
@@ -323,10 +323,10 @@ Status KuduSession::Data::ApplyWriteOp(
     sp::weak_ptr<KuduSession> weak_session,
     KuduWriteOperation* write_op) {
 
-  if (!write_op) {
+  if (PREDICT_FALSE(!write_op)) {
     return Status::InvalidArgument("NULL operation");
   }
-  if (!write_op->row().IsKeySet()) {
+  if (PREDICT_FALSE(!write_op->row().IsKeySet())) {
     Status status = Status::IllegalState(
         "Key not specified", write_op->ToString());
     error_collector_->AddError(
@@ -358,7 +358,7 @@ Status KuduSession::Data::ApplyWriteOp(
   // A sanity check: before trying to validate against any of run-time metrics,
   // verify that the single operation can fit into an empty buffer
   // given the restriction on the buffer size.
-  if (required_size > max_size) {
+  if (PREDICT_FALSE(required_size > max_size)) {
     Status s = Status::Incomplete(strings::Substitute(
           "buffer size limit is too small to fit operation: "
           "required $0, size limit $1",

http://git-wip-us.apache.org/repos/asf/kudu/blob/7959cd40/src/kudu/client/write_op.cc
----------------------------------------------------------------------
diff --git a/src/kudu/client/write_op.cc b/src/kudu/client/write_op.cc
index bfb9550..686da98 100644
--- a/src/kudu/client/write_op.cc
+++ b/src/kudu/client/write_op.cc
@@ -77,9 +77,8 @@ int64_t KuduWriteOperation::SizeInBuffer() const {
       size += schema->column(i).type_info()->size();
       if (schema->column(i).type_info()->physical_type() == BINARY) {
         ContiguousRow row(schema, row_.row_data_);
-        Slice bin;
-        memcpy(&bin, row.cell_ptr(i), sizeof(bin));
-        size += bin.size();
+        const Slice* bin = reinterpret_cast<const Slice*>(row.cell_ptr(i));
+        size += bin->size();
       }
     }
   }


Mime
View raw message