impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.
Date Thu, 30 Nov 2017 00:27:09 GMT
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
......................................................................


Patch Set 8:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@326
PS8, Line 326:    *    condition using the LHS constant. The rewrite handles group by,
Not clear what "handles" here means. It doesn't really do anything special to group by, limit,
etc.

Maybe instead say something like: The inline view ensures that the filter is logically evaluated
against the RHS result even if the RHS contains analytics/aggregation/limit. Adding the filter
directly to any clause within the RHS statement is generally incorrect.


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@327
PS8, Line 327:    *    analytic functions, limit, and uncorrelated subqueries. Correlated
subqueries
Move last sentence into a TODO like this:

TODO: Correlated NOT IN subqueries require that column resolution be extended to handle ...


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@330
PS8, Line 330:    *    TODO: handle correlated NOT IN case
remove (see above comment)


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@340
PS8, Line 340:    *    So, if C is equal to any x_i, the expression is false. Similarly, if
any
Nice description. I'm wondering if we directly translate this insight into the WHERE condition
to make it easier to understand:

WHERE C IS NULL OR tmp.x IS NULL OR C = tmp.x

That way we can also shrink/remove the last paragraph in this comment block.


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@355
PS8, Line 355:     Preconditions.checkArgument(rhs.contains(Subquery.class));
not needed because L356-357 performs the same check


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@398
PS8, Line 398:     newSubquery.analyze(rhsQuery.getAnalyzer());
This seems wrong. It might happen to work but pretty sure this will eventually lead to trouble.
Each query block needs to get its own analyzer and the hierarchy of Analyzers needs to be
fixed.

I'm thinking we should reset the rhsQuery and not analyze newSubquery here. The caller will
analyze the new ExistsPredicate with the Analyzer of the enclosing stmt (the correct analyzer).


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@714
PS8, Line 714:     if (isCorrelatedPredicate(subqueryStmt.getWhereClause(), subqueryTupleIds))
{
containsCorrelatedPredicate()?


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@277
PS8, Line 277:       // NOT IN subquery with a correlated non-equi predicate is ok if the
subquery only
[NOT] IN subquery


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@292
PS8, Line 292:         AnalyzesOk(String.format("select 1 from functional.allcomplextypes
t " +
What about queries without complex types and non-constant LHS in the IN predicate? Not sure
that case has coverage now.


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@299
PS8, Line 299:       AnalysisError(String.format("select * from functional.alltypestiny t
where id %s " +
These tests are pretty rough to follow, perhaps we should spent some time brainstorming how
we can clean up these tests (after this patch of course)


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@437
PS8, Line 437:       // Tests for constant on the left hand side of [NOT] IN.
Since we are doing a double rewrite here, it's interesting to test nested subqueries. Can
you add 1-2 tests for that?


Since you already did the work of separating these tests, it seems good to add them into a
new @Test, something like TestConstantLhsInSubqueries()


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@476
PS8, Line 476:  
You can omit this part if you like. Error message matching is prefix based.

You can use this to clean up other AnalysisError() calls. Imo it's not always useful to include
the whole failed stmt


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@482
PS8, Line 482:               "SELECT int_col FROM functional.alltypestiny b WHERE b.id = a.id
ORDER BY int_col ASC LIMIT 5");
long line, you can omit this line in the expected error message


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@520
PS8, Line 520:         + "(select * from functional.tinyinttable x where t.id = x.int_col)");
please move "+" to previous line (easier to read and more consistent with existing style)


http://gerrit.cloudera.org:8080/#/c/8322/8/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/8322/8/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2117
PS8, Line 2117: # IMPALA-1422: Constant on LHS of IN, uncorrelated subquery
Let's not add the JIRA number. We typically only do that for bugfixes, this is a new feature.


http://gerrit.cloudera.org:8080/#/c/8322/8/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2118
PS8, Line 2118: select * from functional.alltypessmall where 1 in (select int_col from functional.alltypestiny);
no need to trailing semicolon


http://gerrit.cloudera.org:8080/#/c/8322/8/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2140
PS8, Line 2140: |     predicates: 1 IS NULL OR CASE WHEN int_col IS NULL THEN 1 ELSE int_col
END = 1
Need to update this based on the new WHERE predicate


http://gerrit.cloudera.org:8080/#/c/8322/8/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2147
PS8, Line 2147: select * from functional.alltypessmall a where 1 in (select int_col from functional.alltypestiny
b where b.id = a.id);
long lines (here and elsewhere)


http://gerrit.cloudera.org:8080/#/c/8322/8/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2219
PS8, Line 2219: |  
whitespace


http://gerrit.cloudera.org:8080/#/c/8322/8/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2281
PS8, Line 2281: ====
Add a case with nested subqueries to make sure the generated plan is sane


http://gerrit.cloudera.org:8080/#/c/8322/8/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/8322/8/testdata/workloads/functional-query/queries/QueryTest/subquery.test@839
PS8, Line 839: ---- QUERY
Nice tests!

Move into a new file constant-lhs-in-subquery or something. This file is becoming pretty unwieldy.

We should probably split up even more sometime later.


http://gerrit.cloudera.org:8080/#/c/8322/8/testdata/workloads/functional-query/queries/QueryTest/subquery.test@872
PS8, Line 872: # Constant on LHS of [NOT] IN. Basics to test:
Move to the top?


http://gerrit.cloudera.org:8080/#/c/8322/8/testdata/workloads/functional-query/queries/QueryTest/subquery.test@901
PS8, Line 901: # LHS null, Predicate NOT IN, RHS NULL. Expect no results.
Comment does not match SQL


http://gerrit.cloudera.org:8080/#/c/8322/8/testdata/workloads/functional-query/queries/QueryTest/subquery.test@1126
PS8, Line 1126: ====
Add 1-2 tests with nested subqueries.



-- 
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 8
Gerrit-Owner: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Nov 2017 00:27:09 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message