Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 65B5F200D26 for ; Fri, 20 Oct 2017 19:56:39 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 642C7160BCB; Fri, 20 Oct 2017 17:56:39 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A9EB01609ED for ; Fri, 20 Oct 2017 19:56:38 +0200 (CEST) Received: (qmail 83029 invoked by uid 500); 20 Oct 2017 17:56:37 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 83018 invoked by uid 99); 20 Oct 2017 17:56:37 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Oct 2017 17:56:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D43BD1A1220 for ; Fri, 20 Oct 2017 17:56:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.362 X-Spam-Level: ** X-Spam-Status: No, score=2.362 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id VWzjkpdVdSsX for ; Fri, 20 Oct 2017 17:56:35 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 254385FBE6 for ; Fri, 20 Oct 2017 17:56:35 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v9KHuYoT002863; Fri, 20 Oct 2017 17:56:34 GMT Message-Id: <201710201756.v9KHuYoT002863@ip-10-146-233-104.ec2.internal> X-Gerrit-PatchSet: 10 Date: Fri, 20 Oct 2017 17:56:33 +0000 From: "Tim Armstrong (Code Review)" To: Lars Volker , impala-cr@cloudera.com, reviews@impala.incubator.apache.org X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4177=2CIMPALA-6039=3A_batched_bit_reading_and_rle_decoding=0A?= X-Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6 X-Gerrit-Change-Number: 8267 X-Gerrit-ChangeURL: X-Gerrit-Commit: 88b860bfb24a4abd7aba6dc8077524b7b32e2c6b In-Reply-To: References: Reply-To: tarmstrong@cloudera.com, impala-cr@cloudera.com, lv@cloudera.com, marcelk@gmail.com, reviews@impala.incubator.apache.org MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.14.2 Content-Type: multipart/alternative; boundary="6CYS16pDq/8="; charset=UTF-8 archived-at: Fri, 20 Oct 2017 17:56:39 -0000 --6CYS16pDq/8= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello Lars Volker, I'd like you to reexamine a change=2E Please visit = http://gerrit=2Ecloudera=2Eorg:8080/8267 to look at the new patch set (#= 10)=2E Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rl= e decoding =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E I= MPALA-4177,IMPALA-6039: batched bit reading and rle decoding Switch the de= coders to using more batch-oriented interfaces=2E As an intermediate step t= his doesn't make the interfaces of LevelDecoder or DictDecoder batch-orient= ed, only the lower-level utility classes=2E The next step would be to chan= ge those interfaces to be batch-oriented and make according optimisations i= n parquet=2E This could deliver much larger perf improvements than the curr= ent patch=2E The high-level changes are=2E * BitReader -> BatchedBitReader= , which is built to unpack runs of 32 bit-packed values efficiently=2E * = RleDecoder -> RleBatchDecoder, which exposes the repeated and literal run= s to the caller and uses BatchedBitReader to unpack literal runs efficien= tly=2E * Dict decoding uses RleBatchDecoder to decode repeated runs efficie= ntly and uses the BitPacking utilities to unpack and encode in a single = step=2E Also removes an older benchmark that isn't too interesting (since= the batch-oriented approach to encoding and decoding is so much faster tha= n the value-by-value approach)=2E Testing: * Ran core tests=2E * Updated u= nit tests to exercise new code=2E * Added test coverage for the deprecated = bit-packed level encoding to that it still works (there was no coverage p= reviously)=2E Perf: Single-node benchmarks showed a few % performance gain= =2E 16 node cluster benchmarks only showed a gain for TPC-H nested=2E Chan= ge-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6 --- M be/src/benchmarks/CM= akeLists=2Etxt M be/src/benchmarks/bit-packing-benchmark=2Ecc D be/src/benc= hmarks/rle-benchmark=2Ecc M be/src/exec/parquet-column-readers=2Ecc M be/sr= c/exec/parquet-column-readers=2Eh D be/src/experiments/bit-stream-utils=2E8= byte=2Eh D be/src/experiments/bit-stream-utils=2E8byte=2Einline=2Eh M be/sr= c/util/bit-packing=2Eh M be/src/util/bit-packing=2Einline=2Eh M be/src/util= /bit-stream-utils=2Eh M be/src/util/bit-stream-utils=2Einline=2Eh M be/src/= util/dict-encoding=2Eh M be/src/util/dict-test=2Ecc M be/src/util/parquet-r= eader=2Ecc M be/src/util/rle-encoding=2Eh M be/src/util/rle-test=2Ecc M tes= tdata/data/README A testdata/data/alltypes_agg_bitpacked_def_levels=2Eparqu= et A testdata/workloads/functional-query/queries/QueryTest/parquet-def-leve= ls=2Etest M tests/query_test/test_scanners=2Epy 20 files changed, 1,251 ins= ertions(+), 962 deletions(-) git pull ssh://gerrit=2Ecloudera=2Eorg:294= 18/Impala-ASF refs/changes/67/8267/10 -- To view, visit http://gerrit=2Ecl= oudera=2Eorg:8080/8267 To unsubscribe, visit http://gerrit=2Ecloudera=2Eorg= :8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Mes= sageType: newpatchset Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111= 552ac6 Gerrit-Change-Number: 8267 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Arm= strong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong --6CYS16pDq/8=--