impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4794: Partition distinct expr for skew data
Date Sat, 12 Aug 2017 00:48:51 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4794: Partition distinct expr for skew data

Patch Set 1:

Commit Message:

Line 7: IMPALA-4794: Partition distinct expr for skew data
Summary msg is not very clear to me, e.g. "for data skew" should be "handling data skew".
I suggest something like:

IMPALA-4794: Grouping distinct agg plan robust to data skew

Line 9: Currently in an aggregation with grouping and distinct expr, there
Needs cleanup for grammar and wording. Writing crisp prose is very important for readers.

Here are some pointers. The first sentence should describe the problem and high-level solution.
For example:

This patch changes the query plan for grouping distinct aggregations to be more robust to
data skew in the grouping expressions.

The existing plan ... etc.

The new plan ... etc.

Line 19: Testing: In planner test, some plans with distinct aggregation change.
Testing: Modified existing planner tests which already provide sufficient coverage.
File fe/src/main/java/org/apache/impala/planner/

Line 861:   private PlanFragment createPhase2DistinctAggregationFragment(AggregationNode node,
please rename 'node' to 'phase2AggNode' for clarity

Line 864:     ArrayList<Expr> groupingExprs = node.getAggInfo().getGroupingExprs();
Clean up: these groupingExprs are not needed anymore, and neither is the hasGrouping variable.
Remove these and directly check node.getAggInfo().getGroupingExprs().isEmpty() in L924

Line 867:     // The first-phase aggregation node is already in the child fragment.
first-phase -> phase-1

Line 871:     List<Expr> partitionExprs = null;
combine with L883

Line 875:     // - merge fragment, hash-partitioned on grouping and distinct exprs:
first merge fragment

Line 876:     //   * merge agg of phase 1
let's use "phase-N" consistently (with hyphen)

Line 878:     // - merge fragment 2, partitioned on grouping exprs or unpartitioned with no
second merge fragment

Line 887:     PlanFragment mergeFragment = null;
use firstMergeFragment and secondMergeFragment variables to make the distinction clearer

Line 917:     if (mergeFragment != childFragment) fragments.add(mergeFragment);
move at the end of the the else block in L895

Line 920:     // Any limit should be placed in the final merge aggregation node
I think it's clearer to cluster the code for limit setting/unsetting together, e.g. in L939
you can do something like:

// Limit should be applied at the final merge aggregation node

Line 922:     node.unsetLimit();
If we have grouping, we should make the phase2AggNode streaming like in L899

Line 925:       // place the merge aggregation of the 2nd phase in an unpartitioned fragment;
remove comment, doesn't add any value

Line 929:         node.getAggInfo().getMergeAggInfo().getGroupingExprs());
indent 4

Line 931:     // add preceding merge fragment at end
not sure what this comment means, I don't think it makes sense after your changes

Line 934:       node.getAggInfo().getMergeAggInfo());
indent 4

Line 936:     // Transfer having predicates. If hasGrouping == true, the predicates should
adjust comment to reflect the new code

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bdada0e328b555900c7b7ff8aabc8eb15ae8fa9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-HasComments: Yes

View raw message