impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows
Date Tue, 10 Jan 2017 23:34:20 GMT
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows
......................................................................


Patch Set 7:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/5389/6/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 388:           runtime_state_, Substitute(PREPARE_FOR_READ_FAILED_ERROR_MSG, id_));
> can't this cause us to fail a query that we used to be able to execute, sin
As discussed, this is okay because we would've needed at least as much memory to go through
the repartitioning path anyways.

Also, I looked into moving PrepareForRead into OutputUnmatchedBuild to minimize the time we
have the memory used, but I don't think this works, at least not without complicating the
code, as OutputUnmatchedBuild may be called multiple times for the same partition if we reach
out_batch->AtCapacity and calling PrepareForRead each time would be incorrect. And anyways,
OutputUnmatchedBuild will be called almost immediately after this, so the difference in memory
usage is minor.


PS6, Line 652: There were no probe rows so we skipped building the hash table. In this case,
all
> this comment doesn't explains more "what" than "why", and what is fairly cl
Done


Line 677:     if (output_unmatched_batch_iter_->AtEnd()) {
> Something I missed in the first pass. 'output_unmatched_batch_' may have ha
I  played around with this for awhile without success.

In particular, it seemed like for the queries that i ran, no resources were attached to output_unmatched_batch_.
Is there a specific situation in which you expect this to happen?


Line 702:     output_build_partitions_.pop_front();
> why is this the case?
See the comment at the top of this function.


http://gerrit.cloudera.org:8080/#/c/5389/6/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS6, Line 291:  will
             :   /// be the on
> seems redundant with "first"
Done


PS6, Line 293: titions_'.
             :   Status OutputAllBuild(RowBatch* out_batch);
> this seems to contradict the "first partition of 'output_build_partitions_'
Done


PS6, Line 306: 
> doesn't it increment the out_batch_iterator?
No, out_batch_iterator.Next() is called after OutputBuildRow, depending on the return of EvalConjuncts.

I could certainly move that logic into OutputBuildRow if you think its clearer, it would just
give OutputBuildRow a somewhat long argument list.


Line 362:   /// rows. If there are no probe rows, we just prepare the build side to be read
by
> this comment is now stale.
Done


Line 409:   RuntimeProfile::Counter* null_aware_eval_timer_;
> add a comment to explain it. e.g. what does it mean to be "skipped"?
Done


PS6, Line 477: putAllBuild() to iterat
> this also seems to contradict the earlier claim that there is only one part
Done


Line 479:   std::unique_ptr<RowBatch> output_unmatched_batch_;
> this comment is tough to understand if you haven't already read the code in
Done


http://gerrit.cloudera.org:8080/#/c/5389/7/testdata/workloads/tpch/queries/tpch-outer-joins.test
File testdata/workloads/tpch/queries/tpch-outer-joins.test:

Line 32
To be clear, I verified that this test is still testing the situation from the referenced
JIRA without these lines.


-- 
To view, visit http://gerrit.cloudera.org:8080/5389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message