impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows
Date Mon, 12 Dec 2016 23:32:14 GMT
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows
......................................................................


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/5389/2/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 378: 
> We don't even need to do this for many join modes. E.g. for INNER JOIN if t
Done


Line 387:     }
> The log message might be a bit confusing, since it's 
I added this mostly for testability, though obviously its not ideal to expose something in
the profile just for testing that's not going to be useful to users.

I tried playing around with making a "NumHashTablesBuilt" counter or something that could
be compared to the number of partitions, which we already have a counter for, but its tricky
because some partitions won't have hash tables built (if they're too big and get repartitioned),
and some will have their hash tables built twice (if they get evicted after building and before
being processed). I didn't look at how hard it would be to add a "NumHashTablesRebuiltAfterBeingEvicted"
counter.

Any ideas? Maybe just turning this into a counter like you say would be a little better.


Line 649: 
> Can you factor out the two cases into different functions?
Done


PS2, Line 651: // We 
> DCHECK_EQ
Done


Line 656:   }
> Can't we skip this check and just look at eos below? BTS::GetNext() has the
Done


Line 666:   // it is done by the loop in GetNext(), possibly with multiple calls to OutputAllBUild()
> Can we just save the batch iterator instead of output_unmatched_pos_?
Done


Line 671:   const int num_conjuncts = conjunct_ctxs_.size();
> This output row creation is kind of subtle and duplicated - factor out into
Done


Line 691:       }
> Why is this true?
I added a comment explaining this above the DCHECK_EQ(output_build_partitions_.size(), 1)
above here which is enforcing the same condition.


http://gerrit.cloudera.org:8080/#/c/5389/2/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS2, Line 460: ransfer to output batch
> Normally we single-quote variable names in comments so they stand out a bit
Done


Line 464:   /// (i.e. the resulting joined row passes other_join_conjuncts).
> We should mention that it's a batch from output_build_partitions_.front()
Done


http://gerrit.cloudera.org:8080/#/c/5389/2/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

Line 631: ---- QUERY
> Can you comment which join modes these are testing?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message