Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9C74A200BA9 for ; Sun, 23 Oct 2016 23:16:57 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9AE96160AFC; Sun, 23 Oct 2016 21:16:57 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DEE29160ADF for ; Sun, 23 Oct 2016 23:16:56 +0200 (CEST) Received: (qmail 88847 invoked by uid 500); 23 Oct 2016 21:16:56 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 88836 invoked by uid 99); 23 Oct 2016 21:16:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Oct 2016 21:16:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E2CC118056E for ; Sun, 23 Oct 2016 21:16:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 5TtK4s_WkAxe for ; Sun, 23 Oct 2016 21:16:52 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id D76805FBF7 for ; Sun, 23 Oct 2016 21:16:51 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id u9NLGoas013909; Sun, 23 Oct 2016 21:16:50 GMT Message-Id: <201610232116.u9NLGoas013909@ip-10-146-233-104.ec2.internal> Date: Sun, 23 Oct 2016 21:16:50 +0000 From: "Tim Armstrong (Code Review)" To: Jim Apple , impala-cr@cloudera.com, reviews@impala.incubator.apache.org Reply-To: tarmstrong@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4300=3A_Speed_up_BloomFilter=3A=3AOr_with_SIMD=0A?= X-Gerrit-Change-Id: I840799d9cfb81285c796e2abfe2029bb869b0f67 X-Gerrit-ChangeURL: X-Gerrit-Commit: 33d703af77cddc27e2faacf7c7b5ae929cfec4e1 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Sun, 23 Oct 2016 21:16:57 -0000 Tim Armstrong has posted comments on this change. Change subject: IMPALA-4300: Speed up BloomFilter::Or with SIMD ...................................................................... Patch Set 1: (4 comments) LGTM, just want to make sure the comment is a little clearer. http://gerrit.cloudera.org:8080/#/c/4813/1/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: Line 167 > What's stringe is that without the pragma and without __restrict__, Matt Go I experimented with some variants: https://godbolt.org/g/rFww4g .BloomFilterOrPointersUnrolledIvDep() and BloomFilterOrIvDepInt() emit exactly the code you want. There are a few things that make a difference: * Avoiding the indirection via the in/out pointers helps a lot. If we extract vector.data()/vector.size() from the loop body we can get the same effect. * Rather subtly, using a signed/unsigned loop counter makes a difference (I think with the signed type the compiler is allow to assume no overflow and this somehow helps). * The unrolling and #pragma ivdep aren't always necessary to emit the code, but it's a lot cleaner with them. I think gcc's vectoriser is smart enough to insert a check for whether the arrays overlap and two versions of the code for the overlapping/non-overlapping cases. The explicit SIMD version has the virtue of not being sensitive to all these things. PS1, Line 168: _mm256_loadu_pd > BloomFilters do, but TBloomFilters do not, and from using gdb I see that th Oh, duh. PS1, Line 199: _mm_storeu_si128 > Let's continue this conversation above. Yeah, I doubt there's much difference between the aligned/unaligned versions, just didn't want to use the unaligned ones if there was already an invariant that the memory was aligned. http://gerrit.cloudera.org:8080/#/c/4813/2/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: Line 181: // The trivial loop out[i] |= in[i] should auto-vectorize with gcc at -O3, but it is not Let's fix this comment so it's clear that the issue is that the code is not written in a way that is friendly to auto-vectorization. I've seen other cryptic comments about these kind of things ("compiler is confused by x") and they're hard to know what to do with (if not outright misleading). -- To view, visit http://gerrit.cloudera.org:8080/4813 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I840799d9cfb81285c796e2abfe2029bb869b0f67 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes