impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Date Wed, 28 Sep 2016 01:10:53 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

Patch Set 14:

File be/src/exec/

PS14, Line 297: SCOPED_TIMER(build_timer_);
> I am mostly concerned about preserving "BuildPartitionTime" and "BuildTime"
Ah, you're right, the older code for some reason didn't include the partitioning time in BuildTime.
This doesn't make a lot of sense to me, since build_timer_ is a BlockingJoinNode-level concept,
where we shouldn't care about the implementation details of the build phase. With the old
code, I don't think someone would guess that meaning of "BuildTime" without looking at the

I think the new decomposition of BuildTime covering the whole initial build phase, then BuildRowsPartitionTime
and HashTablesBuildTime covering the time spent in those operations is more intuitive.

I guess it could cause some confusion comparing profiles from different versions, but I think
it would be better to clean things up here instead of trying to preserve the profile counters
exactly. The profile layout is changing anyway with the patch so probably a good time to do
File be/src/exec/

PS19, Line 377: ;
> if
File be/src/exec/

Line 77:   const vector<TEqJoinCondition>& eq_join_conjuncts =
> It helps to add a comment like:
I added a different comment with the same intent - just pointing out that allocating PhjBuidler
like this is a temporary measure.

PS19, Line 479:   while (true) {
              :     DCHECK(status.ok());
              :     DCHECK_NE(state_, PARTITIONING_BUILD) << "Should not be in GetNext()";
              :     RETURN_IF_CANCELLED(state);
              :     RETURN_IF_ERROR(QueryMaintenance(state));
              :     if (!output_build_partitions_.empty()) {
> It seems unfortunate that we cannot remove this given the change at line 58
Exactly, it's necessary for that case.

My feeling is that we shouldn't try to refactor this loop piecemeal, rather we should just
figure out the right abstraction/organisation (maybe a state machine plus some logical decomposition
into functions?) and completely clean it up. Not in this patch though.

Line 554:       if (out_batch->AtCapacity() || ReachedLimit()) break;
> DCHECK(input_partition_ == NULL);

PS19, Line 599:     if (got_partition) continue; // Probe the spilled partition.
> Not your change but I find this duplicated check with line 501 a bit confus
Done. Should be equivalent, since the null_aware_partition is only non-NULL for NAAJ, and
there are DCHECKs in PrepareNullAwarePartition() that both this partitions are non-NULL (it's
not reachable otherwise because of the checks up the top of the loop).

PS19, Line 724: 
> I suppose it's good to avoid small buffers on all probe partitions for cons
Yes potentially, it could increase it by up to 16MB but I think it's worth accepting that
increase for a rarely-used join type in order to avoid the possibility of spilling while probing.

Line 740:       state, this, builder_->null_aware_partition(), std::move(probe_rows)));
> Is there an error path in which probe_rows == NULL ? Looks like line 725 im
Should be ok to remove it.

PS19, Line 751: 
> Same question as above.
See response above.

PS19, Line 968: ore
> nit: newly created
File be/src/util/runtime-profile.h:

PS19, Line 129: AddChildFirst
> Feel free to ignore but PrependChild() seems more appropriate.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message