orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From omalley <...@git.apache.org>
Subject [GitHub] orc pull request #245: ORC-161: Proposal for new decimal encodings and stati...
Date Fri, 13 Apr 2018 22:51:09 GMT
Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/245#discussion_r181525137
  
    --- Diff: site/_docs/encodings.md ---
    @@ -109,10 +109,20 @@ DIRECT_V2     | PRESENT         | Yes      | Boolean RLE
     Decimal was introduced in Hive 0.11 with infinite precision (the total
     number of digits). In Hive 0.13, the definition was change to limit
     the precision to a maximum of 38 digits, which conveniently uses 127
    -bits plus a sign bit. The current encoding of decimal columns stores
    -the integer representation of the value as an unbounded length zigzag
    -encoded base 128 varint. The scale is stored in the SECONDARY stream
    -as an signed integer.
    +bits plus a sign bit.
    +
    +DIRECT and DIRECT_V2 encodings of decimal columns stores the integer
    +representation of the value as an unbounded length zigzag encoded base
    +128 varint. The scale is stored in the SECONDARY stream as an signed
    +integer.
    +
    +In ORC 2.0, DECIMAL_V1 and DECIMAL_V2 encodins are introduced and
    --- End diff --
    
    In ORCv2, we'll just pick a RLE and not leave it pickable.
    
    In terms of the encoding names, I'm a bit torn. My original inclination would be to use
DECIMAL64 and DECIMAL128 as encoding names. However, It would be nice to have the ability
to use dictionaries, so we'd need dictionary forms of them too. Thoughts?


---

Mime
View raw message