impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5226: handle single subquery in or predicate
Date Thu, 09 Jul 2020 23:07:54 GMT
Hello Aman Sinha, Shant Hovsepian, Zoltan Borok-Nagy, David Rorke, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16152

to look at the new patch set (#9).

Change subject: IMPALA-5226: handle single subquery in or predicate
......................................................................

IMPALA-5226: handle single subquery in or predicate

This patch supports a subset of cases of subqueries
inside OR inside WHERE and HAVING clauses.

The approach used is to rewrite the subquery into
a many-to-one LEFT OUTER JOIN with the subquery and
then replace the subquery in the expression with a
reference to the single select list expressions of
the subquery. This works because:
* A many-to-one LEFT OUTER JOIN returns one output row
  for each left input row, meaning that for every row
  in the original query before the rewrite, we get
  the same row plus a single matched row from the subquery
* Expressions can be rewritten to refer to a slotref from
  the right side of the LEFT OUTER JOIN without affecting
  semantics. E.g. an IN subquery becomes <slot> IS NOT NULL
  or <operator> (<subquery>) becomes <operator> <slot>.

This does not affect SELECT list subqueries, which are
rewritten using a different mechanism that can already
support some subqueries in disjuncts.

Correlated and uncorrelated subqueries are both supported, but
various limitations are present.
Limitations:
* Only one subquery per predicate is supported. The rewriting approach
  should generalize to multiple subqueries but other code needs
  refactoring to handle this case.
* EXISTS and NOT EXISTS subqueries are not supported. The rewriting
  approach can generalise to that, but we need to add or pick a
  select list item from the subquery to check for NULL/IS NOT NULL
  and a little more work is required to do that correctly.
* NOT IN is not supported because of the special NULL semantics.
* Subqueries with aggregates + grouping by are not supported because
  we rely on adding distinct to select list and we don't
  support distinct + aggregations because of IMPALA-5098.

Tests:
* Positive analysis tests for IN and binary predicate operators.
* Negative analysis tests for unsupported subquery operators.
* Negative analysis tests for multiple subqueries.
* Negative analysis tests for runtime scalar subqueries.
* Positive and negative analysis tests for aggregations in subquery.
* TPC-DS Query 45 planner and query tests
* Targeted planner tests for various supported queries.
* Targeted functional tests to confirm plans are executable and
  return correct result. These exercise a mix of the supported
  features - correlated/correlated, aggregate functions,
  EXISTS/comparator, etc.
* Tests for BETWEEN predicate, which is supported as a side-effect
  of being rewritten during analysis.

Change-Id: I64588992901afd7cd885419a0b7f949b0b174976
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q45.test
M tests/query_test/test_tpcds_queries.py
M tests/util/parse_util.py
10 files changed, 1,272 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/16152/9
-- 
To view, visit http://gerrit.cloudera.org:8080/16152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64588992901afd7cd885419a0b7f949b0b174976
Gerrit-Change-Number: 16152
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsinha@cloudera.com>
Gerrit-Reviewer: David Rorke <drorke@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <shant@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <boroknagyz@cloudera.com>

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message