Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7FC83200D16 for ; Tue, 10 Oct 2017 21:55:33 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7E40C160BE0; Tue, 10 Oct 2017 19:55:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C4F551609CB for ; Tue, 10 Oct 2017 21:55:32 +0200 (CEST) Received: (qmail 93508 invoked by uid 500); 10 Oct 2017 19:55:32 -0000 Mailing-List: contact commits-help@parquet.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@parquet.apache.org Delivered-To: mailing list commits@parquet.apache.org Received: (qmail 93499 invoked by uid 99); 10 Oct 2017 19:55:31 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Oct 2017 19:55:31 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id D0D23F218B; Tue, 10 Oct 2017 19:55:30 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: blue@apache.org To: commits@parquet.apache.org Message-Id: X-Mailer: ASF-Git Admin Mailer Subject: parquet-format git commit: PARQUET-1124: Add LZ4 and Zstd compression codecs. Date: Tue, 10 Oct 2017 19:55:30 +0000 (UTC) archived-at: Tue, 10 Oct 2017 19:55:33 -0000 Repository: parquet-format Updated Branches: refs/heads/master ddc18a7af -> 84460c5a1 PARQUET-1124: Add LZ4 and Zstd compression codecs. This adds LZ4 and Zstd compression codecs to the format spec. From recent tests, Zstd appears to out-perform other codecs (including brotli on reads). LZ4 is widely available because it is built into Hadoop, making it a good successor to snappy, for fast compression and decompression when speed is mroe important than compression ratio. Author: Ryan Blue Closes #70 from rdblue/PARQUET-1124-add-compression-codecs and squashes the following commits: 939328e [Ryan Blue] PARQUET-1124: Add warning about external codec dependencies. affad3d [Ryan Blue] PARQUET-1124: Add lz4 and zstd compression codecs. Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/84460c5a Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/84460c5a Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/84460c5a Branch: refs/heads/master Commit: 84460c5a1e8aadf52a40dcf2aeb2fc875df4ac2a Parents: ddc18a7 Author: Ryan Blue Authored: Tue Oct 10 12:55:27 2017 -0700 Committer: Ryan Blue Committed: Tue Oct 10 12:55:27 2017 -0700 ---------------------------------------------------------------------- src/main/thrift/parquet.thrift | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/parquet-format/blob/84460c5a/src/main/thrift/parquet.thrift ---------------------------------------------------------------------- diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift index a4e193e..38cddc7 100644 --- a/src/main/thrift/parquet.thrift +++ b/src/main/thrift/parquet.thrift @@ -451,13 +451,20 @@ enum Encoding { /** * Supported compression algorithms. + * + * Codecs added in 2.3.2 can be read by readers based on 2.3.2 and later. + * Codec support may vary between readers based on the format version and + * libraries available at runtime. Gzip, Snappy, and LZ4 codecs are + * widely available, while Zstd and Brotli require additional libraries. */ enum CompressionCodec { UNCOMPRESSED = 0; SNAPPY = 1; GZIP = 2; LZO = 3; - BROTLI = 4; + BROTLI = 4; // Added in 2.3.2 + LZ4 = 5; // Added in 2.3.2 + ZSTD = 6; // Added in 2.3.2 } enum PageType {