orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Douglas Drinka (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ORC-143) DELTA encoding may exaggerate number of bits required
Date Wed, 08 Feb 2017 17:13:41 GMT
Douglas Drinka created ORC-143:

             Summary: DELTA encoding may exaggerate number of bits required
                 Key: ORC-143
                 URL: https://issues.apache.org/jira/browse/ORC-143
             Project: Orc
          Issue Type: Bug
          Components: Java
    Affects Versions: 1.4.0
            Reporter: Douglas Drinka
            Priority: Minor

Consider the following code:
{code:title=RunLengthIntegerWriterV2.java, determineEncoding()|borderStyle=solid}
    this.min = literals[0];
    long max = literals[0];
    final long initialDelta = literals[1] - literals[0];
    long currDelta = initialDelta;
    long deltaMax = initialDelta;
    this.adjDeltas[0] = initialDelta;

Given the following sequence of longs: {0, 10000, 10001, 10002, 10003, 10004, 10005} {{deltaMax}}
would be 10000.  {{deltaMax}} is used to determine the bit width of the encoded delta array,
but the bit-packed output doesn't include the first delta--rather, it's encoded in Delta Base
as a varint.

I believe {{deltaMax}} should be set to 0 initially, allowing the later check for {{(i >
1)}} to ignore the first delta correctly.

Sorry for no pull request with a regression test case.  I'm not set up for java development
here.  It may also be that I'm reading this wrong.

This message was sent by Atlassian JIRA

View raw message