From dev-return-2348-archive-asf-public=cust-asf.ponee.io@orc.apache.org Sun Jun 3 17:35:32 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 9EA7E180600 for ; Sun, 3 Jun 2018 17:35:31 +0200 (CEST) Received: (qmail 91575 invoked by uid 500); 3 Jun 2018 15:35:30 -0000 Mailing-List: contact dev-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@orc.apache.org Delivered-To: mailing list dev@orc.apache.org Received: (qmail 91546 invoked by uid 99); 3 Jun 2018 15:35:30 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Jun 2018 15:35:30 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id DB8EEDFA6D; Sun, 3 Jun 2018 15:35:29 +0000 (UTC) From: majetideepak To: dev@orc.apache.org Reply-To: dev@orc.apache.org References: In-Reply-To: Subject: [GitHub] orc pull request #273: ORC-343 Enable C++ writer to support RleV2 Content-Type: text/plain Message-Id: <20180603153529.DB8EEDFA6D@git1-us-west.apache.org> Date: Sun, 3 Jun 2018 15:35:29 +0000 (UTC) Github user majetideepak commented on a diff in the pull request: https://github.com/apache/orc/pull/273#discussion_r192593861 --- Diff: c++/src/RLEv2.hh --- @@ -25,13 +25,89 @@ #include +#define MIN_REPEAT 3 +#define HIST_LEN 32 namespace orc { -class RleDecoderV2 : public RleDecoder { +struct FixedBitSizes { + enum FBS { + ONE = 0, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, ELEVEN, TWELVE, + THIRTEEN, FOURTEEN, FIFTEEN, SIXTEEN, SEVENTEEN, EIGHTEEN, NINETEEN, + TWENTY, TWENTYONE, TWENTYTWO, TWENTYTHREE, TWENTYFOUR, TWENTYSIX, + TWENTYEIGHT, THIRTY, THIRTYTWO, FORTY, FORTYEIGHT, FIFTYSIX, SIXTYFOUR, SIZE + }; +}; + +enum EncodingType { SHORT_REPEAT=0, DIRECT=1, PATCHED_BASE=2, DELTA=3 }; + +struct EncodingOption { + EncodingType encoding; + int64_t fixedDelta; + int64_t gapVsPatchListCount; + int64_t zigzagLiteralsCount; + int64_t baseRedLiteralsCount; + int64_t adjDeltasCount; + uint32_t zzBits90p; + uint32_t zzBits100p; + uint32_t brBits95p; + uint32_t brBits100p; + uint32_t bitsDeltaMax; + uint32_t patchWidth; + uint32_t patchGapWidth; + uint32_t patchLength; + int64_t min; + bool isFixedDelta; +}; + +class RleEncoderV2 : public RleEncoder { public: + RleEncoderV2(std::unique_ptr outStream, bool hasSigned, bool alignBitPacking = true); --- End diff -- `alignedBitPacking` is always true. Should we add a WriterOption to enable/disable it? Java uses the Encoding Strategy to choose this. C++ currently does not have this. ``` java/core/src/java/org/apache/orc/impl/writer/TreeWriterBase.java:144 if (writer.getEncodingStrategy().equals(OrcFile.EncodingStrategy.SPEED)) { alignedBitpacking = true; } ``` ---