impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Date Tue, 13 Sep 2016 21:42:02 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
......................................................................


Patch Set 14:

(31 comments)

Some comments and questions.

http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/blocking-join-node.cc
File be/src/exec/blocking-join-node.cc:

PS14, Line 297: SCOPED_TIMER(build_timer_);
We used to have two different timers in PHJ for measuring the time to hash the input rows
into different partitions and the time to build the hash tables. It would be good to be able
to retain that fined grained tracking PHJBuilder.


http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 179:     largest_fraction = max(largest_fraction,
DCHECK_GT(num_build_rows, 0);


PS14, Line 318: a hash table
hash tables


Line 341:   // building hash tables to avoid building hash tables for partitions we'll spill
anyway.
That's an interesting comment. If I understand it correctly, it means that calling InitSpilledPartitionProbeStreams()
may cause more partitions to be spilled as it needs to pin write buffers for the spilled streams
so we should make sure we allocate all the write streams to trigger all the extra spills which
may happen before building hash tables. If so, would you mind elaborating a bit in the comment:

"Allocate probe buffers for all partitions that are already spilled. Do this before building
hash tables as allocating probe buffers may cause some more partitions to be spilled. This
avoids wasted work on building hash tables for partitions which end up being spilled eventually."

Do you know how often this case happens ? I suppose this complication will be gone eventually
once reservation is in place.


Line 401:       RETURN_IF_ERROR(SpillPartition());
May help to document that failure to find any partition to spill (e.g. all partitions are
spilled) will return an error code. It looks as if we may be in an infinite loop on the first
glance.


PS14, Line 529: PhjBuilder::HashTableStoresNulls()
This seems to belong to PartitionedHashJoinNode conceptually.


Line 646:   do {
Please see comments in BlockingJoinNode,  it would be great to retain timers for InsertBatch()
and ProcessBuildInput() separately for finer grain tracking.


PS14, Line 782: process_build_batch_fn_level0
'process_build_batch_fn_level0_'


http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

Line 425: 
nit: unnecessary blank line ?


http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

PS14, Line 151: 
              : 
              : 
This may be important information to retain.


PS14, Line 87: Expr::CreateExprTree(pool_, eq_join_conjunct.right
Does this overlap with CreateExprTree() in PhjBuilder::Init() ? Same question for build_expr_ctxs_.
If they are not redundant, mind commenting on how they are used differently ?


Line 213:   for (unique_ptr<ProbePartition>& partition : spilled_partitions_)
missing { }


PS14, Line 494: to output
outputting


PS14, Line 585: hash_partitions_
'hash_partitions_'


Line 594:       continue;
Is there a reason why we cannot call OutputUnmatchedBuild() directly at this point ?


PS14, Line 996: s)
next_state ?


http://gerrit.cloudera.org:8080/#/c/3873/12/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS12, Line 81: hash_partitions_
'hash_partitions_'


PS12, Line 88: probe_hash_partitions_
'probe_hash_partitions_'


PS12, Line 112: input_partition_
'input_partition_'


http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS14, Line 437: 
Is this merged into build_timer_ in BlockingJoinNode now ? It may be helpful for debugging
to maintain two separate timers.


PS14, Line 81: hash_partitions_
'hash_partitions_'


PS14, Line 84: This is the only phase 
These are the only phases


PS14, Line 88: probe_hash_partitions_
'probe_hash_partitions_'


PS14, Line 112: input_partition_
'input_partition_'


PS14, Line 118: input_partition_
'input_partition_'


PS14, Line 298: spilled_partitions_
'spilled_partitions_'


PS14, Line 302: probe_batch_pos_
'probe_batch_pos_'


PS14, Line 306: probe_batch_pos_
'probe_batch_pos_'


PS14, Line 309: input_partition_
'input_partition_'


PS14, Line 408: null_aware_
'null_aware_'


Line 445:     /// block cannot be acquired. "delete_on_read" mode is used, so blocks in the
stream
... used for the buffered tuple stream, so..


-- 
To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message