impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Date Thu, 22 Jun 2017 00:50:44 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
......................................................................


Patch Set 3:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/7153/3//COMMIT_MSG
Commit Message:

PS3, Line 15: a plan node
> I hear your concerns and I do think we should think it through carefully. I
Unless we are sure this is a common user scenario, I lean towards a default value of 0. I
don't think the risk of regressions is worth the unknown benefit.

I agree with Tim that the behavior is easy to explain, but it still means users might have
to deal with tuning this value after upgrading.


http://gerrit.cloudera.org:8080/#/c/7153/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

Line 497:         query_options->__set_disable_codegen_rows_threshold(atoi(value.c_str()));
Consider using StringParser::StringToInt() like in other places here, so we can properly validate
the input.


http://gerrit.cloudera.org:8080/#/c/7153/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 255:   // If the number of rows per node is below the threshold codegen will be automatically
If the number of rows processed per node ...

(to make it clear that these are the input rows and not the output rows)


http://gerrit.cloudera.org:8080/#/c/7153/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

Line 279:   // If the number of rows per node is below the threshold and disable_codegen is
unset,
If the number of rows processed per node


http://gerrit.cloudera.org:8080/#/c/7153/3/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 151: 
Needs PlannerTests. You can do something like PlannerTest.testComputeStatsMtDop().


Line 152:     checkForDisableCodegen(rootFragment.getPlanRoot());
Add comment why we have to do this on the fragmented plan


Line 477:     boolean isSmallQuery = visitor.valid() && visitor.getMaxRowsProcessed()
< threshold;
remove visitor.valid() and inline result into the if


Line 492:   private void checkForDisableCodegen(PlanNode planRoot) {
planRoot -> distributedPlan


Line 493:     // Determine the maximum number of rows processed by a instance of a plan node.
This comments reads like the comment on MaxRowsProcessedVisitor.maxRowsProcessed_. I think
we need crisper explanations for the two maximum numbers.


http://gerrit.cloudera.org:8080/#/c/7153/3/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java:

Line 32:   // True if we should abort because if we don't have valid estimates
remove second "if"


Line 47:     if (caller instanceof HashJoinNode || caller instanceof NestedLoopJoinNode) {
if (caller instanceof JoinNode) foundJoinNode_ = true;


Line 60:           || (missingStats && !scan.hasLimit() && scan.getConjuncts().isEmpty()))
{
Why would the existence of scan conjuncts make our estimate valid despite missing stats?


Line 66:           Math.max(maxRowsProcessedPerNode_, numRows / numNodes);
Nit: Do the division as double and take the ceil, otherwise it seems like we're losing rows.


Line 77:           Math.max(maxRowsProcessedPerNode_, numRows / numNodes);
Nit: Do the division as double and take the ceil, otherwise it seems like we're losing rows.


Line 81:   public boolean valid() {
single line


Line 96:     return foundJoinNode_;
Preconditions.checkState(valid_); seems appropriate here as well


http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test
File testdata/workloads/functional-query/queries/QueryTest/codegen.test:

Line 2: ---- QUERY
Any reason not to make these PlannerTests?


http://gerrit.cloudera.org:8080/#/c/7153/3/tests/common/test_dimensions.py
File tests/common/test_dimensions.py:

Line 139:                                       disable_codegen_rows_threshold_options=[5000],
Why not 0? For the same reason you use 0 below.


-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Juan Yu <jyu@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message