impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tianyi Wang (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4794: Grouping distinct agg plan robust to data skew
Date Mon, 14 Aug 2017 22:09:49 GMT
Tianyi Wang has uploaded a new patch set (#4).

Change subject: IMPALA-4794: Grouping distinct agg plan robust to data skew
......................................................................

IMPALA-4794: Grouping distinct agg plan robust to data skew

This patch changes the query plan for grouping distinct aggregations to
be more robust to data skew in the grouping expressions. The existing
plan partitions data between phase-1 and phase-2 by the grouping exprs.
Under this strategy the data skewness on the grouping exprs directly
impacts performance. The new plan partitions data by both the grouping
exprs and distinct agg exprs, then adds one more aggregation and
exchange node. The new plan is more robust to data skew but does more
work than the old plan.

Testing: Modified existing planner tests which already provide
sufficient coverage. The pattern is that the distinct agg exprs are
added to the first exchange node, followed by an additional merge agg
and exchange node.

Change-Id: I7bdada0e328b555900c7b7ff8aabc8eb15ae8fa9
---
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
7 files changed, 215 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/7643/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7643
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bdada0e328b555900c7b7ff8aabc8eb15ae8fa9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <twang@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <twang@cloudera.com>

Mime
View raw message