impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows
Date Fri, 09 Dec 2016 22:50:47 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows
......................................................................


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/5389/2/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 378:     RETURN_IF_ERROR(input_partition_->build_partition()->build_rows()->PrepareForRead(
We don't even need to do this for many join modes. E.g. for INNER JOIN if the probe partition
is empty, we can just discard the partition.

I think you actually implemented the optimisation for the harder join types (right outer join,
etc) instead of the easier types (inner join, etc).


Line 387:     runtime_profile_->AddInfoString(
The log message might be a bit confusing, since it's 

Maybe it's not worth exposing in the profile? Or exposing it as a simple counter.


Line 649:   if (output_unmatched_pos_ != -1) {
Can you factor out the two cases into different functions?


PS2, Line 651: DCHECK
DCHECK_EQ


Line 656:         if (build_rows->num_rows() == build_rows->rows_returned()) {
Can't we skip this check and just look at eos below? BTS::GetNext() has the exact same check
to return eos.  It's kind of weird that we're doing this and ignoring eos.


Line 666:       for (RowBatch::Iterator batch_iter(output_unmatched_batch_.get(), output_unmatched_pos_);
Can we just save the batch iterator instead of output_unmatched_pos_?


Line 671:         if (join_op_ == TJoinOp::RIGHT_ANTI_JOIN) {
This output row creation is kind of subtle and duplicated - factor out into a helper function?


Line 691:       DCHECK(output_build_partitions_.empty());
Why is this true?


http://gerrit.cloudera.org:8080/#/c/5389/2/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS2, Line 460: output_unmatched_batch_
Normally we single-quote variable names in comments so they stand out a bit more. Many of
the surrounding comments don't but we should do it for new ones.


Line 464:   std::unique_ptr<RowBatch> output_unmatched_batch_;
We should mention that it's a batch from output_build_partitions_.front()


http://gerrit.cloudera.org:8080/#/c/5389/2/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

Line 631: ---- QUERY
Can you comment which join modes these are testing?


-- 
To view, visit http://gerrit.cloudera.org:8080/5389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message