parquet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From w...@apache.org
Subject parquet-format git commit: PARQUET-1171: Clarify scope of usage for RLE, BIT_PACKED encodings
Date Wed, 10 Jan 2018 03:05:05 GMT
Repository: parquet-format
Updated Branches:
  refs/heads/master c6d306daa -> 2696f9e0a


PARQUET-1171: Clarify scope of usage for RLE, BIT_PACKED encodings

See related discussions on mailing list, JIRA

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #79 from wesm/PARQUET-1171 and squashes the following commits:

185348e [Wes McKinney] Fix typo
f29b38c [Wes McKinney] Add notes to indicate scope of usage for RLE, BIT_PACKED encodings


Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/2696f9e0
Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/2696f9e0
Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/2696f9e0

Branch: refs/heads/master
Commit: 2696f9e0a966bdb98afaca69bf633750a2b02ff2
Parents: c6d306d
Author: Wes McKinney <wes.mckinney@twosigma.com>
Authored: Tue Jan 9 22:04:57 2018 -0500
Committer: Wes McKinney <wes.mckinney@twosigma.com>
Committed: Tue Jan 9 22:04:57 2018 -0500

----------------------------------------------------------------------
 Encodings.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/parquet-format/blob/2696f9e0/Encodings.md
----------------------------------------------------------------------
diff --git a/Encodings.md b/Encodings.md
index 0450588..28429be 100644
--- a/Encodings.md
+++ b/Encodings.md
@@ -59,6 +59,7 @@ Data page format: the bit width used to encode the entry ids stored as 1
byte (m
 followed by the values encoded using RLE/Bit packed described above (with the given bit width).
 
 ### <a name="RLE"></a>Run Length Encoding / Bit-Packing Hybrid (RLE = 3)
+
 This encoding uses a combination of bit-packing and run length encoding to more efficiently
store repeated values.
 
 The grammar for this encoding looks like this, given a fixed bit-width known in advance:
@@ -103,7 +104,15 @@ repeated-value := value that is repeated, using a fixed-width of round-up-to-nex
 
 2. varint-encode() is ULEB-128 encoding, see https://en.wikipedia.org/wiki/LEB128
 
+Note that the RLE encoding method is only supported for the following types of
+data:
+
+* Repetition and definition levels
+* Dictionary indices
+* Boolean values in data pages, as an alternative to PLAIN encoding
+
 ### <a name="BITPACKED"></a>Bit-packed (Deprecated) (BIT_PACKED = 4)
+
 This is a bit-packed only encoding, which is deprecated and will be replaced by the [RLE/bit-packing](#RLE)
hybrid encoding.
 Each value is encoded back to back using a fixed width.
 There is no padding between values (except for the last byte) which is padded with 0s.
@@ -126,6 +135,9 @@ bit value: 00000101 00111001 01110111
 bit label: ABCDEFGH IJKLMNOP QRSTUVWX
 ```
 
+Note that the BIT_PACKED encoding method is only supported for encoding
+repetition and definition levels.
+
 ### <a name="DELTAENC"></a>Delta Encoding (DELTA_BINARY_PACKED = 5)
 Supported Types: INT32, INT64
 


Mime
View raw message