kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject incubator-kudu git commit: Change flush defaults to encourage parallel IO and larger flushes
Date Wed, 25 May 2016 20:10:42 GMT
Repository: incubator-kudu
Updated Branches:
  refs/heads/master f9aa4ee37 -> b27bb312b


Change flush defaults to encourage parallel IO and larger flushes

Based on some recent experiments with high throughput writes using YCSB[1],
these defaults make more sense for the typical throughput-oriented applications
that Kudu is currently targeting.

[1] http://getkudu.io/2016/04/26/ycsb.html

Change-Id: I1c70d9c76ed33bbfca5480e1d1f343c6dab36d3b
Reviewed-on: http://gerrit.cloudera.org:8080/3186
Tested-by: Kudu Jenkins
Reviewed-by: Jean-Daniel Cryans


Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/b27bb312
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/b27bb312
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/b27bb312

Branch: refs/heads/master
Commit: b27bb312b5f12126e3a150d0885cd9b178f42336
Parents: f9aa4ee
Author: Todd Lipcon <todd@apache.org>
Authored: Mon May 23 22:23:30 2016 -0700
Committer: Todd Lipcon <todd@apache.org>
Committed: Wed May 25 20:10:09 2016 +0000

----------------------------------------------------------------------
 docs/release_notes.adoc               |  6 ++++++
 src/kudu/cfile/cfile_writer.cc        | 20 +++++++++++++-------
 src/kudu/fs/block_manager.cc          |  7 -------
 src/kudu/tablet/tablet_peer-test.cc   |  3 +++
 src/kudu/tablet/tablet_peer_mm_ops.cc |  5 +++--
 5 files changed, 25 insertions(+), 16 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/b27bb312/docs/release_notes.adoc
----------------------------------------------------------------------
diff --git a/docs/release_notes.adoc b/docs/release_notes.adoc
index b3889c0..44ec278 100644
--- a/docs/release_notes.adoc
+++ b/docs/release_notes.adoc
@@ -88,6 +88,12 @@ Hadoop storage technologies.
   instead of 10, the default RPC timeout is now 10 seconds instead of 5, and the default
scan
   timeout is now 30 seconds instead of 15.
 
+- Some default settings related to I/O behavior during flushes and compactions have been
changed:
+  The default for `flush_threshold_mb` has been increased from 64MB to 1000MB. The default
+  `cfile_do_on_finish` has been changed from `close` to `flush`.
+  link:http://getkudu.io/2016/04/26/ycsb.html[Experiments using YCSB] indicate that these
+  values will provide better throughput for write-heavy applications on typical server hardware.
+
 [[rn_0.8.0]]
 === Release notes specific to 0.8.0
 

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/b27bb312/src/kudu/cfile/cfile_writer.cc
----------------------------------------------------------------------
diff --git a/src/kudu/cfile/cfile_writer.cc b/src/kudu/cfile/cfile_writer.cc
index 8ceb97b..e739416 100644
--- a/src/kudu/cfile/cfile_writer.cc
+++ b/src/kudu/cfile/cfile_writer.cc
@@ -45,14 +45,20 @@ DEFINE_string(cfile_default_compression_codec, "none",
               "Default cfile block compression codec.");
 TAG_FLAG(cfile_default_compression_codec, advanced);
 
-// The default value is optimized for the case where:
-// 1. the cfile blocks are colocated with the WALs.
-// 2. The underlying hardware is a spinning disk.
-// 3. The underlying filesystem is either XFS or EXT4.
-// 4. block_coalesce_close is false (see fs/block_manager.cc).
+// The default value is optimized for throughput in the case that
+// there are multiple drives backing the tablet. By asynchronously
+// flushing each cfile before issuing any fsyncs, the IO across
+// disks is done in parallel.
 //
-// When all conditions hold, this value ensures low latency for WAL writes.
-DEFINE_string(cfile_do_on_finish, "close",
+// This increases throughput but can harm latency in the case that
+// there are few disks and the WAL is on the same disk as the
+// data blocks. The default is chosen based on the assumptions that:
+// - latency is leveled across machines by Raft
+// - latency-sensitive applications can devote a disk to the WAL
+// - super-sensitive applications can devote an SSD to the WAL.
+// - users could always change this to "close", which slows down throughput
+//   but may improve write latency.
+DEFINE_string(cfile_do_on_finish, "flush",
               "What to do to cfile blocks when writing is finished. "
               "Possible values are 'close', 'flush', or 'nothing'.");
 TAG_FLAG(cfile_do_on_finish, experimental);

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/b27bb312/src/kudu/fs/block_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/fs/block_manager.cc b/src/kudu/fs/block_manager.cc
index 1add6d1..f50c59a 100644
--- a/src/kudu/fs/block_manager.cc
+++ b/src/kudu/fs/block_manager.cc
@@ -19,13 +19,6 @@
 #include "kudu/util/flag_tags.h"
 #include "kudu/util/metrics.h"
 
-// The default value is optimized for the case where:
-// 1. the cfile blocks are colocated with the WALs.
-// 2. The underlying hardware is a spinning disk.
-// 3. The underlying filesystem is either XFS or EXT4.
-// 4. cfile_do_on_finish is 'close' (see cfile/cfile_writer.cc).
-//
-// When all conditions hold, this value ensures low latency for WAL writes.
 DEFINE_bool(block_coalesce_close, false,
             "Coalesce synchronization of data during CloseBlocks()");
 TAG_FLAG(block_coalesce_close, experimental);

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/b27bb312/src/kudu/tablet/tablet_peer-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/tablet_peer-test.cc b/src/kudu/tablet/tablet_peer-test.cc
index 65cc5f3..302711b 100644
--- a/src/kudu/tablet/tablet_peer-test.cc
+++ b/src/kudu/tablet/tablet_peer-test.cc
@@ -48,6 +48,7 @@
 METRIC_DECLARE_entity(tablet);
 
 DECLARE_int32(log_min_seconds_to_retain);
+DECLARE_int32(flush_threshold_mb);
 
 namespace kudu {
 namespace tablet {
@@ -542,6 +543,8 @@ TEST_F(TabletPeerTest, TestGCEmptyLog) {
 }
 
 TEST_F(TabletPeerTest, TestFlushOpsPerfImprovements) {
+  FLAGS_flush_threshold_mb = 64;
+
   MaintenanceOpStats stats;
 
   // Just on the threshold and not enough time has passed for a time-based flush.

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/b27bb312/src/kudu/tablet/tablet_peer_mm_ops.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/tablet_peer_mm_ops.cc b/src/kudu/tablet/tablet_peer_mm_ops.cc
index 4d1e43b..c8bc920 100644
--- a/src/kudu/tablet/tablet_peer_mm_ops.cc
+++ b/src/kudu/tablet/tablet_peer_mm_ops.cc
@@ -29,9 +29,10 @@
 #include "kudu/util/flag_tags.h"
 #include "kudu/util/metrics.h"
 
-DEFINE_int32(flush_threshold_mb, 64,
+DEFINE_int32(flush_threshold_mb, 1024,
              "Size at which MemRowSet flushes are triggered. "
-             "A MRS can still flush below this threshold if it if hasn't flushed in a while");
+             "A MRS can still flush below this threshold if it if hasn't flushed in a while,
"
+             "or if the server-wide memory limit has been reached.");
 TAG_FLAG(flush_threshold_mb, experimental);
 
 METRIC_DEFINE_gauge_uint32(tablet, log_gc_running,


Mime
View raw message