Repository: parquet-format
Updated Branches:
refs/heads/master ddc18a7af -> 84460c5a1
PARQUET-1124: Add LZ4 and Zstd compression codecs.
This adds LZ4 and Zstd compression codecs to the format spec. From recent tests, Zstd appears
to out-perform other codecs (including brotli on reads). LZ4 is widely available because it
is built into Hadoop, making it a good successor to snappy, for fast compression and decompression
when speed is mroe important than compression ratio.
Author: Ryan Blue <blue@apache.org>
Closes #70 from rdblue/PARQUET-1124-add-compression-codecs and squashes the following commits:
939328e [Ryan Blue] PARQUET-1124: Add warning about external codec dependencies.
affad3d [Ryan Blue] PARQUET-1124: Add lz4 and zstd compression codecs.
Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/84460c5a
Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/84460c5a
Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/84460c5a
Branch: refs/heads/master
Commit: 84460c5a1e8aadf52a40dcf2aeb2fc875df4ac2a
Parents: ddc18a7
Author: Ryan Blue <blue@apache.org>
Authored: Tue Oct 10 12:55:27 2017 -0700
Committer: Ryan Blue <blue@apache.org>
Committed: Tue Oct 10 12:55:27 2017 -0700
----------------------------------------------------------------------
src/main/thrift/parquet.thrift | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/parquet-format/blob/84460c5a/src/main/thrift/parquet.thrift
----------------------------------------------------------------------
diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index a4e193e..38cddc7 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -451,13 +451,20 @@ enum Encoding {
/**
* Supported compression algorithms.
+ *
+ * Codecs added in 2.3.2 can be read by readers based on 2.3.2 and later.
+ * Codec support may vary between readers based on the format version and
+ * libraries available at runtime. Gzip, Snappy, and LZ4 codecs are
+ * widely available, while Zstd and Brotli require additional libraries.
*/
enum CompressionCodec {
UNCOMPRESSED = 0;
SNAPPY = 1;
GZIP = 2;
LZO = 3;
- BROTLI = 4;
+ BROTLI = 4; // Added in 2.3.2
+ LZ4 = 5; // Added in 2.3.2
+ ZSTD = 6; // Added in 2.3.2
}
enum PageType {
|