parquet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject parquet-format git commit: PARQUET-1171: Clarify scope of usage for RLE, BIT_PACKED encodings
Date Wed, 10 Jan 2018 03:05:05 GMT
Repository: parquet-format
Updated Branches:
  refs/heads/master c6d306daa -> 2696f9e0a

PARQUET-1171: Clarify scope of usage for RLE, BIT_PACKED encodings

See related discussions on mailing list, JIRA

Author: Wes McKinney <>

Closes #79 from wesm/PARQUET-1171 and squashes the following commits:

185348e [Wes McKinney] Fix typo
f29b38c [Wes McKinney] Add notes to indicate scope of usage for RLE, BIT_PACKED encodings


Branch: refs/heads/master
Commit: 2696f9e0a966bdb98afaca69bf633750a2b02ff2
Parents: c6d306d
Author: Wes McKinney <>
Authored: Tue Jan 9 22:04:57 2018 -0500
Committer: Wes McKinney <>
Committed: Tue Jan 9 22:04:57 2018 -0500

---------------------------------------------------------------------- | 12 ++++++++++++
 1 file changed, 12 insertions(+)
diff --git a/ b/
index 0450588..28429be 100644
--- a/
+++ b/
@@ -59,6 +59,7 @@ Data page format: the bit width used to encode the entry ids stored as 1
byte (m
 followed by the values encoded using RLE/Bit packed described above (with the given bit width).
 ### <a name="RLE"></a>Run Length Encoding / Bit-Packing Hybrid (RLE = 3)
 This encoding uses a combination of bit-packing and run length encoding to more efficiently
store repeated values.
 The grammar for this encoding looks like this, given a fixed bit-width known in advance:
@@ -103,7 +104,15 @@ repeated-value := value that is repeated, using a fixed-width of round-up-to-nex
 2. varint-encode() is ULEB-128 encoding, see
+Note that the RLE encoding method is only supported for the following types of
+* Repetition and definition levels
+* Dictionary indices
+* Boolean values in data pages, as an alternative to PLAIN encoding
 ### <a name="BITPACKED"></a>Bit-packed (Deprecated) (BIT_PACKED = 4)
 This is a bit-packed only encoding, which is deprecated and will be replaced by the [RLE/bit-packing](#RLE)
hybrid encoding.
 Each value is encoded back to back using a fixed width.
 There is no padding between values (except for the last byte) which is padded with 0s.
@@ -126,6 +135,9 @@ bit value: 00000101 00111001 01110111
+Note that the BIT_PACKED encoding method is only supported for encoding
+repetition and definition levels.
 ### <a name="DELTAENC"></a>Delta Encoding (DELTA_BINARY_PACKED = 5)
 Supported Types: INT32, INT64

View raw message