impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Date Mon, 01 Aug 2016 19:52:51 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations

Patch Set 1:


Nice! I mainly focused on the backend, assuming Alex will also look at the FE changes.
File be/src/exec/

Line 130:   if (group_count_ < max_groups_) {
We can turn this into a one-liner without an else.

if (group_count_ >= max_groups_) return Status::OK();

I was thinking about whether it's worth it to try and optimise out the branch with codegen.
It'd be straightforward if you a have non-inlined function call has_max_groups() or even just
max_groups() and use FindAndReplaceConstant(). I think the difference in perf is probably
negligible right now since AddIntermediateTuple() isn't codegen'd, so I'm ok either way

Line 184:   if (group_count_ >= max_groups_ && aggregate_evaluators_.empty()) {
I don't think there's any advantage to cross-compiling this code, so it'd be better to do
this check in the caller.

PS1, Line 185: child_eos_
This is overloading this variable. I'd rather keep the invariant that child_eos_ means the
child returned eos.
File be/src/exec/

PS1, Line 330: group_count_ >= max_groups_ && aggregate_evaluators_.empty()
I think it would be good to factor this out into an inline function, e.g. IsDoneProcessingInput()
- or some better name. Will make it easier to follow the flow of the calling functions and
avoid some duplication of logic.

PS1, Line 331: eos
I think this is overloading the meaning of this 'eos' variable. Maybe rename it to 'done'
so the name doesn't imply that there's the end of some stream.

Actually, it seems like it would make more sense to do this check after batch.Reset() below
or before calling GetNext() above, then just breaking out of the loop. Otherwise we're fetching
an extra batch that we're not processing.

Line 573:     child_batch_->Reset(); // All rows from child_batch_ were processed.
I think we can move the check from ProcessBatchStreaming() to either after this Reset() or
before GetNext() above, instead of setting child_eos_ to break out of this loop.
File be/src/exec/partitioned-aggregation-node.h:

Line 295:   // ExecNode.
Document that it's always a positive value, and set to the max value of int64 if there's no
File fe/src/main/java/com/cloudera/impala/planner/

Line 806:     if (!(node.getAggInfo().getAggregateExprs().isEmpty()
This needs a comment to explain the intent. Or, maybe better, factor this check and the one
below out into a shared method with a meaningful name.

PS1, Line 807:       || 
|| normally goes on the previous line.

PS1, Line 911: !(node.getAggInfo().getAggregateExprs().isEmpty()
             :               || node.getAggInfo().getGroupingExprs().isEmpty()
Make this a method of AggInfo?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I59c5b7af7a73ccdbc5496b28eacb9b6859d202bc
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message