impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.
Date Sat, 04 Nov 2017 00:17:47 GMT
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
......................................................................


Patch Set 6:

(12 comments)

Some high-level comments before I dig deeper.

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@94
PS2, Line 94:     // TODO: derived slot refs (e.g., star-expanded) will not have rawPath set.
Why can't this be addressed in this patch?


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@287
PS2, Line 287:    * rewritten, null is returned. If 'inPred' is rewritten, the resulting expression
... and the RHS is a subquery.


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@299
PS2, Line 299:    * 2) Predicate is NOT IN and RHS returns a single row.
What if the RHS subquery is correlated?


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319
PS2, Line 319:    *     NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL THEN C ELSE
e END) = C)
Is the transformation even correct if RHS is correlated?


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@312
PS6, Line 312:    *    a) No group by or analytic function in subquery.
Ideally, we should not have to distinguish between cases 3a and 3b, and we should always do
this rewrite:

C NOT IN (RHS)

Rewrites to:

NOT EXISTS (SELECT 1 FROM (RHS) v WHERE C IS NULL OR IFNULL(v.e, C) = C)

My understanding is that such a rewrite dues not work for case 3a when there is correlation,
so we use an alternate rewrite that does not require a new inline view. I think we should
reorganize the comments here to reflect this line of thinking and the limitations. For example,
something like this:


3) Predicate is NOT IN and RHS returns multiple rows.

General rewrite: <the generic rewrite using inline view>
<briefly explain why we cannot use this rewrite to motivate cases a and b below>

a) No group by or analytic
<explain why the general rewrite does not work>

b) Group by  or analytic function in subquery
<explain why the general rewrite can be simplified>


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@316
PS6, Line 316:    *    REWRITE:
Use IFNULL() instead of CASE to simplify.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319
PS6, Line 319:    *     NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL THEN C ELSE
e END) = C)
I think there's another case where this rewrite is not correct: when the subquery has an order
by + limit. Basically, we are reimplementing the "predicate push down" correctness checks
here.

I'm wondering whether we should reduce the scope of this patch and only allow those case where
the generic rewrite works. In a separate JIRA we should consider whether we can make the generic
rewrite also work for case 3a, and if not what to do instead.

What do you think?


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@321
PS6, Line 321:    *    Example:
The following query gives me an IllegalStateException:

explain select id from functional.alltypes t1 where 10 not in (select id from functional.alltypestiny
order by id limit 2);

Existing bug or related to this change?


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@328
PS6, Line 328:    *    Note: it useful to think of C NOT IN (RHS) as the boolean expansion:
it is useful


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@340
PS6, Line 340:    *    b) Group by  or analytic function in subquery.
extra space after "by"


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@345
PS6, Line 345:    *      NOT EXISTS (SELECT x FROM (RHS) t WHERE C = t or t IS NOT NULL)
What does C=t mean here?
"t" is a table alias and cannot be compared to a C

It's not immediately clear to me why the rewrite here is equivalent to the generic rewrite
described in 3a. Please explain (see my suggestion above). Alternatively, I'm also ok if you
want to just use the generic rewrite here.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@354
PS6, Line 354:    *     - Assumes that subquery is uncorrelated, which is currently a requirement.
What does this note refer to? Above you state that "All cases apply to both correlated and
uncorrelated subqueries"



-- 
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 6
Gerrit-Owner: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Comment-Date: Sat, 04 Nov 2017 00:17:47 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message