kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject [2/2] kudu git commit: KUDU-1836. Enable compression of DeltaFiles
Date Mon, 23 Jan 2017 23:05:51 GMT
KUDU-1836. Enable compression of DeltaFiles

This adds a new experimental flag for this setting, and changes the
default to be LZ4. LZ4 is quite fast and seems to do a decent job of
compression in real-life scenarios.

I gathered a couple numbers from a ~10GB tablet exported from a use case
at Cloudera which has a lot of UPSERTs. In particular, this workload has
a lot of cases where rows get upserted but the changed value is no
different than the previous contents of the row (so multiple deltas in a
row are basically dupes and highly compressible). This is obviously
close to a best-case, but it's also not a contrived use case (this is a
real app):

Codec       Total size   Ratio
            of deltas
NONE        10458MB
LZO         413MB        (25x)
GZIP        296MB        (35x)

The above numbers come from running the deltafile through 'lzop' and
'gzip', rather than using CFile compression which is limited to a
smaller block size. So, the results will be not quite as good. However,
they're still likely to be 10x or better, which is substantial.

Change-Id: I754b31c63ef6c5d7b4ffbcbb0ad8982f9978ca83
Reviewed-on: http://gerrit.cloudera.org:8080/5737
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dralves@apache.org>

Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/ef57bda2
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/ef57bda2
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/ef57bda2

Branch: refs/heads/master
Commit: ef57bda2c55154ca44c40e00602e9e3de891fa85
Parents: 45b7dba
Author: Todd Lipcon <todd@apache.org>
Authored: Wed Jan 18 18:23:52 2017 -0800
Committer: Todd Lipcon <todd@apache.org>
Committed: Mon Jan 23 22:46:37 2017 +0000

 src/kudu/tablet/deltafile.cc | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/kudu/tablet/deltafile.cc b/src/kudu/tablet/deltafile.cc
index 1664133..3975f4c 100644
--- a/src/kudu/tablet/deltafile.cc
+++ b/src/kudu/tablet/deltafile.cc
@@ -32,6 +32,7 @@
 #include "kudu/tablet/mutation.h"
 #include "kudu/tablet/mvcc.h"
 #include "kudu/util/coding-inl.h"
+#include "kudu/util/compression/compression_codec.h"
 #include "kudu/util/flag_tags.h"
 #include "kudu/util/hexdump.h"
 #include "kudu/util/pb_util.h"
@@ -43,6 +44,10 @@ DEFINE_int32(deltafile_default_block_size, 32*1024,
              "on a per-table basis.");
 TAG_FLAG(deltafile_default_block_size, experimental);
+DEFINE_string(deltafile_default_compression_codec, "lz4",
+              "The compression codec used when writing deltafiles.");
+TAG_FLAG(deltafile_default_compression_codec, experimental);
 using std::shared_ptr;
 using std::unique_ptr;
@@ -74,6 +79,8 @@ DeltaFileWriter::DeltaFileWriter(gscoped_ptr<WritableBlock> block)
   opts.write_validx = true;
   opts.storage_attributes.cfile_block_size = FLAGS_deltafile_default_block_size;
   opts.storage_attributes.encoding = PLAIN_ENCODING;
+  opts.storage_attributes.compression = GetCompressionCodecType(
+      FLAGS_deltafile_default_compression_codec);
   // No optimization for deltafiles because a deltafile index key must decode into a DeltaKey
   opts.optimize_index_keys = false;
   writer_.reset(new cfile::CFileWriter(opts, GetTypeInfo(BINARY), false, std::move(block)));

View raw message