Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B8C4B200BA5 for ; Wed, 5 Oct 2016 01:44:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B7319160ADC; Tue, 4 Oct 2016 23:44:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D5AF1160ACC for ; Wed, 5 Oct 2016 01:44:10 +0200 (CEST) Received: (qmail 12866 invoked by uid 500); 4 Oct 2016 23:44:10 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 12854 invoked by uid 99); 4 Oct 2016 23:44:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2016 23:44:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 6909CC0E0C for ; Tue, 4 Oct 2016 23:44:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id wj8OUmxbezvh for ; Tue, 4 Oct 2016 23:44:07 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id C77B55F393 for ; Tue, 4 Oct 2016 23:44:06 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id u94Ni5LN004440; Tue, 4 Oct 2016 23:44:05 GMT Message-Id: <201610042344.u94Ni5LN004440@ip-10-146-233-104.ec2.internal> Date: Tue, 4 Oct 2016 23:44:05 +0000 From: "Jim Apple (Code Review)" To: Tim Armstrong , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Jim Apple Reply-To: jbapple@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4123=3A_Fast_bit_unpacking=0A?= X-Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622 X-Gerrit-ChangeURL: X-Gerrit-Commit: 13e1633d66391d8a7acea0c548fa4b2da3426457 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Tue, 04 Oct 2016 23:44:11 -0000 Jim Apple has posted comments on this change. Change subject: IMPALA-4123: Fast bit unpacking ...................................................................... Patch Set 3: (34 comments) http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/benchmarks/bit-packing-benchmark.cc File be/src/benchmarks/bit-packing-benchmark.cc: Line 296: } > We need to do this to start the reader reading from the start of the buffer Should offset advance in this loop, then? http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/benchmarks/bit-packing-benchmark.cc File be/src/benchmarks/bit-packing-benchmark.cc: Line 270: Why the blank line here? PS3, Line 277: NUM_VALUES Maybe NUM_OUT_VALUES PS3, Line 305: * d const uint8_t * const data_end ... Line 334: data[i] = static_cast(i); std::iota PS3, Line 336: ( I think you can even elide the parens http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing-test.cc File be/src/util/bit-packing-test.cc: PS3, Line 29: compute_mask ComputeMask, and in an anonymous namespace PS3, Line 33: UnpackSubset With a comment here, if I'm going to use it in PackUnpack. PS3, Line 37: with memory that is this phrase can be omitted PS3, Line 39: int also unsigned? PS3, Line 39: int unsigned? DCHECK <= something? PS3, Line 39: num_values num_in_values? PS3, Line 45: int const, here and elsewhere PS3, Line 51: & compute_mask(bit_width) AKA mask; already hoisted PS3, Line 58: becase because PS3, Line 58: the input buffer size "the input buffer size (num_values)" PS3, Line 60: // Size buffer exactly so that ASAN can detect buffer overruns. I think manual canaray checks might be called for here PS3, Line 65: ASSERT_EQ Can you add more " << error message " to your ASSERTs? http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h File be/src/util/bit-packing.h: Line 36: /// little-endian order). E.g. the values 1, 2, 3, 4 packed with bit width 4 results > We don't do this anywhere else, seems unnecessarily verbose. Yeah, the verbosity is frustrating. However, cstdint doesn't necessarily pt these in the global namespace, and stdint.h is deprecated. What color do you think we should paint this bikeshed? Maybe a using declaration at the top? PS2, Line 57: Type> > Added some explanation to the class comment for the magic number. It works So, this boundary condition happens for any multiple of 8, and you chose 32 to amortize the loop some more? Did you do any testing of 16 and 64 to see how they compare? http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.h File be/src/util/bit-packing.h: PS3, Line 55: after the last byte of 'in' that was read But the last byte of 'in' that was read might only have been partially used, right? http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.inline.h File be/src/util/bit-packing.inline.h: PS2, Line 108: \ : if (end_bit_offset < load_bit_width) { > Reworded this. I did some experiments early on looking at the assembly and Did you try always_inline and template parameters? If yes, what else did you try? http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.inline.h File be/src/util/bit-packing.inline.h: PS3, Line 56: == 0 can be omitted if the branches are swapped Line 56: const int64_t max_input_values = BIT_WIDTH == 0 ? num_values : (in_bytes * CHAR_BIT) / BIT_WIDTH; long line PS3, Line 90: early : // with I think you accidentally a word PS3, Line 93: LOAD_TYPE I think "LOAD_TYPE" could be "LoadType" to more clearly match the lexical standard for other type names PS3, Line 93: i If 'i' is a compile-time constant, maybe call it "I" or "NTH"? PS3, Line 96: load_bit_width I think constexprs should get the static const treatment of all caps PS3, Line 106: shifted Can you add a comment describing what this is for? PS3, Line 108: 32 How is this 32 derived? PS3, Line 109: end_bit_offset so, this case al the bits we need are in one word? PS3, Line 119: Make non-negative to avoid spurious compiler warning Maybe give it an unsigned type? PS3, Line 124: Special-case impossible BIT_WIDTH 32 to avoid spurious compiler warning this could use a bit more explanation PS3, Line 143: define Check not already defined? -- To view, visit http://gerrit.cloudera.org:8080/4494 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes