parquet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sp...@apache.org
Subject parquet-format git commit: PARQUET-407: Incorrect delta-encoding example
Date Wed, 06 Jan 2016 23:49:03 GMT
Repository: parquet-format
Updated Branches:
  refs/heads/master 6a17cf9c3 -> c3682e3b3


PARQUET-407: Incorrect delta-encoding example

https://issues.apache.org/jira/browse/PARQUET-407

The minimum and the number of bits are incorrect at delta encoding Example 2 In `Encodings.md`.
In the example,

```
Example 2

7, 5, 3, 1, 2, 3, 4, 5, the deltas would be

-2, -2, -2, 1, 1, 1, 1
The minimum is -2, so the relative deltas are:

0, 0, 0, 3, 3, 3, 3

The encoded data is

header: 8 (block size), 1 (miniblock count), 8 (value count), 7 (first value)

block 0 (minimum delta), 2 (bitwidth), 000000111111b (0,0,0,3,3,3 packed on 2 bits)
```

The minimum is -2 and the relative deltas are 0, 0, 0, 3, 3, 3, 3. So, this should be corrected
as below:

```
block -2 (minimum delta), 2 (bitwidth), 00000011111111b (0,0,0,3,3,3,3 packed on 2 bits)
```

Author: socialpercon <socialpercon@gmail.com>

Closes #35 from socialpercon/master and squashes the following commits:

3d5886a [socialpercon] Change incorrect delta-encoding example


Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/c3682e3b
Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/c3682e3b
Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/c3682e3b

Branch: refs/heads/master
Commit: c3682e3b3dd096f3665de2bde405e30bcbd36d7b
Parents: 6a17cf9
Author: socialpercon <socialpercon@gmail.com>
Authored: Wed Jan 6 17:48:03 2016 -0600
Committer: Sergio Pena <sergio.pena@cloudera.com>
Committed: Wed Jan 6 17:48:03 2016 -0600

----------------------------------------------------------------------
 Encodings.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/parquet-format/blob/c3682e3b/Encodings.md
----------------------------------------------------------------------
diff --git a/Encodings.md b/Encodings.md
index 662e6af..42961ba 100644
--- a/Encodings.md
+++ b/Encodings.md
@@ -198,7 +198,7 @@ The encoded data is
 8 (block size), 1 (miniblock count), 8 (value count), 7 (first value)
 
  block
-0 (minimum delta), 2 (bitwidth), 000000111111b (0,0,0,3,3,3 packed on 2 bits)
+-2 (minimum delta), 2 (bitwidth), 00000011111111b (0,0,0,3,3,3,3 packed on 2 bits)
 
 #### Characteristics
 This encoding is similar to the [RLE/bit-packing](#RLE) encoding. However the [RLE/bit-packing](#RLE)
encoding is specifically used when the range of ints is small over the entire page, as is
true of repetition and definition levels. It uses a single bit width for the whole page.


Mime
View raw message