parquet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From b...@apache.org
Subject parquet-format git commit: PARQUET-1124: Add LZ4 and Zstd compression codecs.
Date Tue, 10 Oct 2017 19:55:30 GMT
Repository: parquet-format
Updated Branches:
  refs/heads/master ddc18a7af -> 84460c5a1


PARQUET-1124: Add LZ4 and Zstd compression codecs.

This adds LZ4 and Zstd compression codecs to the format spec. From recent tests, Zstd appears
to out-perform other codecs (including brotli on reads). LZ4 is widely available because it
is built into Hadoop, making it a good successor to snappy, for fast compression and decompression
when speed is mroe important than compression ratio.

Author: Ryan Blue <blue@apache.org>

Closes #70 from rdblue/PARQUET-1124-add-compression-codecs and squashes the following commits:

939328e [Ryan Blue] PARQUET-1124: Add warning about external codec dependencies.
affad3d [Ryan Blue] PARQUET-1124: Add lz4 and zstd compression codecs.


Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/84460c5a
Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/84460c5a
Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/84460c5a

Branch: refs/heads/master
Commit: 84460c5a1e8aadf52a40dcf2aeb2fc875df4ac2a
Parents: ddc18a7
Author: Ryan Blue <blue@apache.org>
Authored: Tue Oct 10 12:55:27 2017 -0700
Committer: Ryan Blue <blue@apache.org>
Committed: Tue Oct 10 12:55:27 2017 -0700

----------------------------------------------------------------------
 src/main/thrift/parquet.thrift | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/parquet-format/blob/84460c5a/src/main/thrift/parquet.thrift
----------------------------------------------------------------------
diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index a4e193e..38cddc7 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -451,13 +451,20 @@ enum Encoding {
 
 /**
  * Supported compression algorithms.
+ *
+ * Codecs added in 2.3.2 can be read by readers based on 2.3.2 and later.
+ * Codec support may vary between readers based on the format version and
+ * libraries available at runtime. Gzip, Snappy, and LZ4 codecs are
+ * widely available, while Zstd and Brotli require additional libraries.
  */
 enum CompressionCodec {
   UNCOMPRESSED = 0;
   SNAPPY = 1;
   GZIP = 2;
   LZO = 3;
-  BROTLI = 4;
+  BROTLI = 4; // Added in 2.3.2
+  LZ4 = 5;    // Added in 2.3.2
+  ZSTD = 6;   // Added in 2.3.2
 }
 
 enum PageType {


Mime
View raw message