impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Date Tue, 13 Sep 2016 21:42:02 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

Patch Set 14:


Some comments and questions.
File be/src/exec/

PS14, Line 297: SCOPED_TIMER(build_timer_);
We used to have two different timers in PHJ for measuring the time to hash the input rows
into different partitions and the time to build the hash tables. It would be good to be able
to retain that fined grained tracking PHJBuilder.
File be/src/exec/

Line 179:     largest_fraction = max(largest_fraction,
DCHECK_GT(num_build_rows, 0);

PS14, Line 318: a hash table
hash tables

Line 341:   // building hash tables to avoid building hash tables for partitions we'll spill
That's an interesting comment. If I understand it correctly, it means that calling InitSpilledPartitionProbeStreams()
may cause more partitions to be spilled as it needs to pin write buffers for the spilled streams
so we should make sure we allocate all the write streams to trigger all the extra spills which
may happen before building hash tables. If so, would you mind elaborating a bit in the comment:

"Allocate probe buffers for all partitions that are already spilled. Do this before building
hash tables as allocating probe buffers may cause some more partitions to be spilled. This
avoids wasted work on building hash tables for partitions which end up being spilled eventually."

Do you know how often this case happens ? I suppose this complication will be gone eventually
once reservation is in place.

Line 401:       RETURN_IF_ERROR(SpillPartition());
May help to document that failure to find any partition to spill (e.g. all partitions are
spilled) will return an error code. It looks as if we may be in an infinite loop on the first

PS14, Line 529: PhjBuilder::HashTableStoresNulls()
This seems to belong to PartitionedHashJoinNode conceptually.

Line 646:   do {
Please see comments in BlockingJoinNode,  it would be great to retain timers for InsertBatch()
and ProcessBuildInput() separately for finer grain tracking.

PS14, Line 782: process_build_batch_fn_level0
File be/src/exec/partitioned-hash-join-builder.h:

Line 425: 
nit: unnecessary blank line ?
File be/src/exec/

PS14, Line 151: 
This may be important information to retain.

PS14, Line 87: Expr::CreateExprTree(pool_, eq_join_conjunct.right
Does this overlap with CreateExprTree() in PhjBuilder::Init() ? Same question for build_expr_ctxs_.
If they are not redundant, mind commenting on how they are used differently ?

Line 213:   for (unique_ptr<ProbePartition>& partition : spilled_partitions_)
missing { }

PS14, Line 494: to output

PS14, Line 585: hash_partitions_

Line 594:       continue;
Is there a reason why we cannot call OutputUnmatchedBuild() directly at this point ?

PS14, Line 996: s)
next_state ?
File be/src/exec/partitioned-hash-join-node.h:

PS12, Line 81: hash_partitions_

PS12, Line 88: probe_hash_partitions_

PS12, Line 112: input_partition_
File be/src/exec/partitioned-hash-join-node.h:

PS14, Line 437: 
Is this merged into build_timer_ in BlockingJoinNode now ? It may be helpful for debugging
to maintain two separate timers.

PS14, Line 81: hash_partitions_

PS14, Line 84: This is the only phase 
These are the only phases

PS14, Line 88: probe_hash_partitions_

PS14, Line 112: input_partition_

PS14, Line 118: input_partition_

PS14, Line 298: spilled_partitions_

PS14, Line 302: probe_batch_pos_

PS14, Line 306: probe_batch_pos_

PS14, Line 309: input_partition_

PS14, Line 408: null_aware_

Line 445:     /// block cannot be acquired. "delete_on_read" mode is used, so blocks in the
... used for the buffered tuple stream, so..

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message