kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a...@apache.org
Subject [8/8] kudu git commit: tpch: improve encodings and compression
Date Thu, 12 Jan 2017 08:23:32 GMT
tpch: improve encodings and compression

Previously all of the columns had been hard-coded to 'PLAIN' encoding.
This is no longer our default, nor would we recommend it for the types
of data used in the TPCH dataset.

This switches to default encodings everywhere, and also enables LZ
compression on the "Comment" column.

The reduction in data size is as follows:

original:
  size: 993MB
  median scan time for TPCH1 query: 0.8685 sec

with LZ4 'comment':
  size: 901MB (1.1x compression vs original)
  scan time: unaffected (query does not read comment column)

with LZ4 'comment' and new encodings:
  size: 342MB (2.9x compression vs original)
  median scan time: 0.8488 sec

Per the above, the on-disk size is reduced by almost 3x and the scan
performance is improved by a couple percent (perhaps within the realm of
measurement error). This workload is small enough to be fully
RAM-resident, but in a larger dataset which is disk-bound on reads, the
space reduction should yield a corresponding improvement in scan performance.

Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Reviewed-on: http://gerrit.cloudera.org:8080/5689
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <adar@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/9d29424e
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/9d29424e
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/9d29424e

Branch: refs/heads/master
Commit: 9d29424e790b54e9c0d8d44dbdf954f16aba5377
Parents: 23c3e0d
Author: Todd Lipcon <todd@apache.org>
Authored: Wed Jan 11 16:55:40 2017 -0800
Committer: Adar Dembo <adar@cloudera.com>
Committed: Thu Jan 12 08:22:14 2017 +0000

----------------------------------------------------------------------
 src/kudu/benchmarks/tpch/tpch-schemas.h | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/9d29424e/src/kudu/benchmarks/tpch/tpch-schemas.h
----------------------------------------------------------------------
diff --git a/src/kudu/benchmarks/tpch/tpch-schemas.h b/src/kudu/benchmarks/tpch/tpch-schemas.h
index 20dfe6c..a417191 100644
--- a/src/kudu/benchmarks/tpch/tpch-schemas.h
+++ b/src/kudu/benchmarks/tpch/tpch-schemas.h
@@ -88,14 +88,15 @@ inline client::KuduSchema CreateLineItemSchema() {
   b.AddColumn(kExtendedPriceColName)->Type(kDouble)->NotNull();
   b.AddColumn(kDiscountColName)->Type(kDouble)->NotNull();
   b.AddColumn(kTaxColName)->Type(kDouble)->NotNull();
-  b.AddColumn(kReturnFlagColName)->Type(kString)->NotNull()->Encoding(kPlainEncoding);
-  b.AddColumn(kLineStatusColName)->Type(kString)->NotNull()->Encoding(kPlainEncoding);
-  b.AddColumn(kShipDateColName)->Type(kString)->NotNull()->Encoding(kPlainEncoding);
-  b.AddColumn(kCommitDateColName)->Type(kString)->NotNull()->Encoding(kPlainEncoding);
-  b.AddColumn(kReceiptDateColName)->Type(kString)->NotNull()->Encoding(kPlainEncoding);
-  b.AddColumn(kShipInstructColName)->Type(kString)->NotNull()->Encoding(kPlainEncoding);
-  b.AddColumn(kShipModeColName)->Type(kString)->NotNull()->Encoding(kPlainEncoding);
-  b.AddColumn(kCommentColName)->Type(kString)->NotNull()->Encoding(kPlainEncoding);
+  b.AddColumn(kReturnFlagColName)->Type(kString)->NotNull();
+  b.AddColumn(kLineStatusColName)->Type(kString)->NotNull();
+  b.AddColumn(kShipDateColName)->Type(kString)->NotNull();
+  b.AddColumn(kCommitDateColName)->Type(kString)->NotNull();
+  b.AddColumn(kReceiptDateColName)->Type(kString)->NotNull();
+  b.AddColumn(kShipInstructColName)->Type(kString)->NotNull();
+  b.AddColumn(kShipModeColName)->Type(kString)->NotNull();
+  b.AddColumn(kCommentColName)->Type(kString)->NotNull()
+      ->Compression(client::KuduColumnStorageAttributes::LZ4);
 
   b.SetPrimaryKey({ kOrderKeyColName, kLineNumberColName });
 


Mime
View raw message