impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Date Fri, 30 Jun 2017 07:47:17 GMT
Alex Behm has uploaded a new patch set (#3).

Change subject: IMPALA-5547: Rework FK/PK join detection.
......................................................................

IMPALA-5547: Rework FK/PK join detection.

Reworks the FK/PK join detection logic to:
- more accurately recognize many-to-many joins
- avoid dim/dim joins for multi-column PKs

The new detection logic maintains our existing philosophy of generally
assuming a FK/PK join, unless there is strong evidence to the
contrary, as follows.

For each set of simple equi-join conjuncts between two tables, we
compute the joint NDV of the right-hand side columns by
multiplication, and if the joint NDV is significantly smaller than
the right-hand side row count, then we are fairly confident that the
right-hand side is not a PK. Otherwise, we assume the set of conjuncts
could represent a FK/PK relationship.

Extends the explain plan to include the outcome of the FK/PK detection
at EXPLAIN_LEVEL > STANDARD.

Performance testing:
1. Full TPC-DS run on 10TB:
   - Q10 improved by >100x
   - Q72 improved by >25x
   - Q17,Q26,Q29 improved by 2x
   - Q64 regressed by 10x
   - Total runtime: Improved by 2x
   - Geomean: Minor improvement
   The regression of Q64 is understood and we will try to address it
   in follow-on changes. The previous plan was better by accident and
   not because of superior logic.
2. Nightly TPC-H and TPC-DS runs:
   - No perf differences

Testing:
- The existing planner test cover the changes.
- Code/hdfs run passed.

Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
---
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
A testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
19 files changed, 1,633 insertions(+), 1,351 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/7257/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zamsden@cloudera.com>

Mime
View raw message