impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Date Tue, 17 Oct 2017 22:06:31 GMT
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE
......................................................................


Patch Set 10:

(27 comments)

http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@455
PS10, Line 455:     jbyteArray thrift_in_query_options) {
Not sure what 'thrift_in_query_options' means. How about using 'tquery_options' instead? Ideally,
we would not need to pass in the existing query options.


http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@457
PS10, Line 457:   DeserializeThriftMsg(env, thrift_in_query_options, &options);
Needs error handling, e.g. using THROW_IF_ERROR_RET


http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@458
PS10, Line 458:   const char *o = env->GetStringUTFChars(options_str, NULL);
* You need to release the string UTF chars, take a look at JniUtfCharGuard in jni-util.h

* style: const char* o


http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@461
PS10, Line 461:   TParseQueryOptionsResult result;
I think it's simpler to convert all Status to an exception. That way we don't need the TParseQueryOptionsResult
at all, and the error handling is consistent (always throws an exception in case of error).


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java@119
PS10, Line 119:     // Create runtime filters. 'singleNodePlan' now points to the root of
the distributed
Comment is wrong. 'singleNodePlan' definitely does not point to the root of the distributed
plan.

Use rootFragment.getPlanRoot() instead


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@354
PS10, Line 354:     public void addTarget(RuntimeFilterTarget target) { targets_.add(target);
}
add newline before


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@397
PS10, Line 397:     Preconditions.checkState(maxNumFilters >= 0);
Why is 0 not a valid value?


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@423
PS10, Line 423:       filter.computeNdvEstimate();
My understanding is that IMPALA-3450 has been fixed and we can remove this code.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@537
PS10, Line 537:    * The assigned filters are the ones for which 'scanNode' can be used a
destination
can be used as a destination node


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@539
PS10, Line 539:    * 1. If DISABLE_ROW_RUNTIME_FILTERING query option is set, a filter is
only assigned to
If the DISABLE_ROW_RUNTIME_FILTERING query option is set ...


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@541
PS10, Line 541:    * 2. If RUNTIME_FILTER_MODE query option is set to LOCAL, a filter is only
assigned to
If the RUNTIME_FILTER_MODE query option is set ...


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@542
PS10, Line 542:    *    'scanNode' if the filter is local to the scan node.
This doesn't really explain what the LOCAL option means. How about:

... a filter is only assigned to 'scanNode' if the filter is produced within the same fragment
that contains the scan node.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@563
PS10, Line 563:       if (!isSingleNodeExec && runtimeFilterMode == TRuntimeFilterMode.LOCAL
&&
Why the !isSingleNodeExec condition? I think that !isLocalTarget implies !isSingleNodeExec.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@576
PS10, Line 576:   static private boolean IsLocalTarget(RuntimeFilter filter, ScanNode targetNode)
{
isLocalTarget()


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@583
PS10, Line 583:   static private boolean IsBoundByPartitionColumns(Analyzer analyzer, Expr
targetExpr,
isBoundByPartitionColumns()


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@92
PS10, Line 92:   public native static byte[] NativeParseQueryOptions(String optionsStr,
to be clearer call the first arg csvQueryOptions


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@93
PS10, Line 93:  
Can we get rid of the second argument and use PlannerTestBase.mergeQueryOptions() instead?


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@282
PS10, Line 282:   public static TParseQueryOptionsResult ParseQueryOptions(String optionsStr,
Public function needs a comment.

API feels a little weird. This function can throw but also return an error in the TParseQueryOptionsResult
response.

I suggest converting all errors into exceptions and returning a TQueryOptions or alternative
just modifying the given TQueryOptions in-place via PlannerTestBase.mergeQueryOptions(). Might
need to factor that out into a utility function.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@300
PS10, Line 300:   public void testRuntimeFilterPruning() {
testRuntimeFilterQueryOptions()?


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@288
PS10, Line 288:         parseQueryOptions(currentTestCase);
Why do we need to call this here and in L338? Isn't the call in L338 enough?


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test
File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test:

http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@2
PS10, Line 2: select count(*) from functional.alltypes a
When using join hints it is recommended to also use straight_join. Also please use the preferred
comment-style hint syntax, i.e.:

select /* +straight_join */ count(*) from functional.alltypes a
  join /* +broadcast */ functional.alltypes b ...

Please apply everywhere in this file.

I know we are already inconsistent in our existing tests, but that should not stop us from
improving new tests :)


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@85
PS10, Line 85: select count(*) from functional.alltypes a
Can we make it more obvious which runtime filters are the most selective? The selection is
not intuitive to follow.


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@89
PS10, Line 89: ---- QUERYOPTIONS
This is really useful! Thanks for adding it!


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@91
PS10, Line 91: ---- PLAN
To reduce the number of test lines, let's only keep those sections that are really needed.
In this case I think the DISTRIBUTEDPLAN should be enough.


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@167
PS10, Line 167: # DISABLE_ROW_RUNTIME_FILTERING is set: only partition column filters are
applied
What's the effect of setting MAX_NUM_RUNTIME_FILTERS=3 in this test? Does it select the 3
partition-pruning filters as expected?


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@252
PS10, Line 252: # RUNTIME_FILTER_MODE is set to LOCAL: only local filters are applied
Same question as above. What happens when we set MAX_NUM_RUNTIME_FILTERS=4. Does it behave
as expected?


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@418
PS10, Line 418: # RUNTIME_FILTER_MODE is OFF: no filters are applied
Is this the same as MAX_NUM_RUNTIME_FILTERS=0?



-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mjacobs@apache.org>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <philip@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Oct 2017 22:06:31 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message