Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A1B49200CC9 for ; Mon, 3 Jul 2017 02:05:02 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A00D3160BFE; Mon, 3 Jul 2017 00:05:02 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E590C160BEE for ; Mon, 3 Jul 2017 02:05:01 +0200 (CEST) Received: (qmail 84796 invoked by uid 500); 3 Jul 2017 00:05:01 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 84515 invoked by uid 99); 3 Jul 2017 00:05:00 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jul 2017 00:05:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 6975CC0D5B for ; Mon, 3 Jul 2017 00:05:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id VqW1QzPyk-vy for ; Mon, 3 Jul 2017 00:04:58 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 6CACD5F341 for ; Mon, 3 Jul 2017 00:04:57 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v6304uYw030927; Mon, 3 Jul 2017 00:04:56 GMT Message-Id: <201707030004.v6304uYw030927@ip-10-146-233-104.ec2.internal> Date: Mon, 3 Jul 2017 00:04:56 +0000 From: "Impala Public Jenkins (Code Review)" To: Alex Behm , impala-cr@cloudera.com, reviews@impala.incubator.apache.org X-Gerrit-MessageType: merged Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5547=3A_Rework_FK/PK_join_detection=2E=0A?= X-Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 X-Gerrit-ChangeURL: X-Gerrit-Commit: 9f678a74269250bf5c7ae2c5e8afd93c5b3734de In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Mon, 03 Jul 2017 00:05:02 -0000 Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5547: Rework FK/PK join detection. ...................................................................... IMPALA-5547: Rework FK/PK join detection. Reworks the FK/PK join detection logic to: - more accurately recognize many-to-many joins - avoid dim/dim joins for multi-column PKs The new detection logic maintains our existing philosophy of generally assuming a FK/PK join, unless there is strong evidence to the contrary, as follows. For each set of simple equi-join conjuncts between two tables, we compute the joint NDV of the right-hand side columns by multiplication, and if the joint NDV is significantly smaller than the right-hand side row count, then we are fairly confident that the right-hand side is not a PK. Otherwise, we assume the set of conjuncts could represent a FK/PK relationship. Extends the explain plan to include the outcome of the FK/PK detection at EXPLAIN_LEVEL > STANDARD. Performance testing: 1. Full TPC-DS run on 10TB: - Q10 improved by >100x - Q72 improved by >25x - Q17,Q26,Q29 improved by 2x - Q64 regressed by 10x - Total runtime: Improved by 2x - Geomean: Minor improvement The regression of Q64 is understood and we will try to address it in follow-on changes. The previous plan was better by accident and not because of superior logic. 2. Nightly TPC-H and TPC-DS runs: - No perf differences Testing: - The existing planner test cover the changes. - Code/hdfs run passed. Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Reviewed-on: http://gerrit.cloudera.org:8080/7257 Reviewed-by: Alex Behm Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test A testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test 16 files changed, 1,616 insertions(+), 1,018 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Zach Amsden