impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows
Date Fri, 09 Dec 2016 22:50:47 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows

Patch Set 2:

File be/src/exec/

Line 378:     RETURN_IF_ERROR(input_partition_->build_partition()->build_rows()->PrepareForRead(
We don't even need to do this for many join modes. E.g. for INNER JOIN if the probe partition
is empty, we can just discard the partition.

I think you actually implemented the optimisation for the harder join types (right outer join,
etc) instead of the easier types (inner join, etc).

Line 387:     runtime_profile_->AddInfoString(
The log message might be a bit confusing, since it's 

Maybe it's not worth exposing in the profile? Or exposing it as a simple counter.

Line 649:   if (output_unmatched_pos_ != -1) {
Can you factor out the two cases into different functions?

PS2, Line 651: DCHECK

Line 656:         if (build_rows->num_rows() == build_rows->rows_returned()) {
Can't we skip this check and just look at eos below? BTS::GetNext() has the exact same check
to return eos.  It's kind of weird that we're doing this and ignoring eos.

Line 666:       for (RowBatch::Iterator batch_iter(output_unmatched_batch_.get(), output_unmatched_pos_);
Can we just save the batch iterator instead of output_unmatched_pos_?

Line 671:         if (join_op_ == TJoinOp::RIGHT_ANTI_JOIN) {
This output row creation is kind of subtle and duplicated - factor out into a helper function?

Line 691:       DCHECK(output_build_partitions_.empty());
Why is this true?
File be/src/exec/partitioned-hash-join-node.h:

PS2, Line 460: output_unmatched_batch_
Normally we single-quote variable names in comments so they stand out a bit more. Many of
the surrounding comments don't but we should do it for new ones.

Line 464:   std::unique_ptr<RowBatch> output_unmatched_batch_;
We should mention that it's a batch from output_build_partitions_.front()
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

Line 631: ---- QUERY
Can you comment which join modes these are testing?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message