impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Date Mon, 26 Sep 2016 06:38:39 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

Patch Set 19:

File be/src/exec/

PS14, Line 297: SCOPED_TIMER(build_timer_);
> It's not clear to me that the original timers were well thought out. It did
I am mostly concerned about preserving "BuildPartitionTime" and "BuildTime" which exist in
the profile now. The former seems to measure the total time to partition all input rows into
their partitions while the latter measures the total time to insert the non-spilled partitions
into their corresponding hash tables.
File be/src/exec/

PS19, Line 377: is
File be/src/exec/

PS14, Line 87: 
> They are redundant currently, but the idea is that PhjBuilder and PhjNode w
OK. To be able to probe the same hash table in parallel, it's necessary to have the thread-private
File be/src/exec/

Line 77:   builder_.reset(
It helps to add a comment like:

TODO: For now, the build_side_expr_ may be duplicated for now but the builder_ need to have
the build_side_expr_ once the builder is separated out into a different fragment.

PS19, Line 479:     if (!output_build_partitions_.empty()) {
              :       DCHECK(NeedToProcessUnmatchedBuildRows());
              :       // Flush the remaining unmatched build rows of any partitions we are
              :       // processing before moving onto the next partition.
              :       OutputUnmatchedBuild(out_batch);
              :       if (!output_build_partitions_.empty()) break;
It seems unfortunate that we cannot remove this given the change at line 588. I suppose this
is needed if multiple output batches are needed full to hold all the unmatched build rows.

The duplicated code block at line 487 almost looks like we should use a goto statement.

Do you think there is a good way to refactor the code a bit ?

Line 554:     if (state_ == PARTITIONING_PROBE) {
DCHECK(input_partition_ == NULL);

PS19, Line 599:     if (builder_->null_aware_partition() == NULL) {
              :       DCHECK(null_aware_probe_partition_ == NULL);
Not your change but I find this duplicated check with line 501 a bit confusing here. Essentially,
this is a slightly different check as it covers cases for all joins while the one at line
501 seems to cover the case in which the null aware partition has been processed completely
if I understand it correctly.

I wonder if you can move this check into the if statement at line 599 and make it if (build_->null_aware_partition()
!= NULL) continue;

And move *eos = true; to the common path shared by all join types.

PS19, Line 724: false 
I suppose it's good to avoid small buffers on all probe partitions for consistency but does
this increase the memory usage of NAAJ ?

Line 740:   if (probe_rows != NULL) {
Is there an error path in which probe_rows == NULL ? Looks like line 725 implies probe_rows
!= NULL.

PS19, Line 751: false
Same question as above.

PS19, Line 968: new
nit: newly created
File be/src/util/runtime-profile.h:

PS19, Line 129: AddChildFirst
Feel free to ignore but PrependChild() seems more appropriate.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message