impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows
Date Mon, 19 Dec 2016 23:04:29 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows
......................................................................


Patch Set 3:

(10 comments)

Looking pretty good

http://gerrit.cloudera.org:8080/#/c/5389/2/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 387:     }
> I added this mostly for testability, though obviously its not ideal to expo
Yeah I think maybe just making a counter like "NumHashTableBuildsSkipped" might be the best
approach. It's hard to confirm we went down this code path otherwise.


http://gerrit.cloudera.org:8080/#/c/5389/3/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 379:     if (NeedToProcessUnmatchedBuildRows()) {
The logic for !NeedToProcessUnmatchedBuildRows() is a bit convoluted - we set *got_partition
= true but don't want to the partition to be processed.

I think we can go a bit further and discard partitions where probe_rows()->num_rows() &&
!NeedToProcessUnmatchedBuildRows() in CleanUpHashPartitions() before they're added to spilled_partitions_.
You should double-check my reasoning but I think at that point we for sure don't need the
partition. Then this would become a DCHECK.


Line 665:   // in  NextSpilledProbeRowBatch(), which adds one partition that is then processed
until
extra space


PS3, Line 666: OutputAllBUild
OutputAllBuild.


PS3, Line 667: mush
must


PS3, Line 666: ), possibly with multiple calls to OutputAllBUild()
             :   // if 'out_batch' capacity is hit
Could condense - I don't think the bit about multiple calls is needed to explain the DCHECK


Line 680:       output_unmatched_batch_iter_.reset(
Could avoid reallocating by assigning *output_unmatched_batch_iter_ = RowBatch::Iterator(...);


Line 705: 
Extra blank line here.


http://gerrit.cloudera.org:8080/#/c/5389/3/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS3, Line 289: OutputUnmatchedBuild
OutputUnmatchedBuild()


PS3, Line 480: output_unmacthed_batch_
typo


-- 
To view, visit http://gerrit.cloudera.org:8080/5389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message