spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hvanhov...@apache.org
Subject spark git commit: [SPARK-17098][SQL] Fix `NullPropagation` optimizer to handle `COUNT(NULL) OVER` correctly
Date Sun, 21 Aug 2016 20:08:13 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 029789611 -> e62b29f29


[SPARK-17098][SQL] Fix `NullPropagation` optimizer to handle `COUNT(NULL) OVER` correctly

## What changes were proposed in this pull request?

Currently, `NullPropagation` optimizer replaces `COUNT` on null literals in a bottom-up fashion.
During that, `WindowExpression` is not covered properly. This PR adds the missing propagation
logic.

**Before**
```scala
scala> sql("SELECT COUNT(1 + NULL) OVER ()").show
java.lang.UnsupportedOperationException: Cannot evaluate expression: cast(0 as bigint) windowspecdefinition(ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
```

**After**
```scala
scala> sql("SELECT COUNT(1 + NULL) OVER ()").show
+----------------------------------------------------------------------------------------------+
|count((1 + CAST(NULL AS INT))) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)|
+----------------------------------------------------------------------------------------------+
|                                                                                        
    0|
+----------------------------------------------------------------------------------------------+
```

## How was this patch tested?

Pass the Jenkins test with a new test case.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #14689 from dongjoon-hyun/SPARK-17098.

(cherry picked from commit 91c2397684ab791572ac57ffb2a924ff058bb64f)
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e62b29f2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e62b29f2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e62b29f2

Branch: refs/heads/branch-2.0
Commit: e62b29f29f44196a1cbe13004ff4abfd8e5be1c1
Parents: 0297896
Author: Dongjoon Hyun <dongjoon@apache.org>
Authored: Sun Aug 21 22:07:47 2016 +0200
Committer: Herman van Hovell <hvanhovell@databricks.com>
Committed: Sun Aug 21 22:08:05 2016 +0200

----------------------------------------------------------------------
 .../sql/catalyst/optimizer/Optimizer.scala      |  2 ++
 .../sql-tests/inputs/null-propagation.sql       |  9 +++++
 .../sql-tests/results/null-propagation.sql.out  | 38 ++++++++++++++++++++
 3 files changed, 49 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/e62b29f2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
----------------------------------------------------------------------
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 88cc0e4..4db541f 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -582,6 +582,8 @@ object NullPropagation extends Rule[LogicalPlan] {
 
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
     case q: LogicalPlan => q transformExpressionsUp {
+      case e @ WindowExpression(Cast(Literal(0L, _), _), _) =>
+        Cast(Literal(0L), e.dataType)
       case e @ AggregateExpression(Count(exprs), _, _, _) if !exprs.exists(nonNullLiteral)
=>
         Cast(Literal(0L), e.dataType)
       case e @ IsNull(c) if !c.nullable => Literal.create(false, BooleanType)

http://git-wip-us.apache.org/repos/asf/spark/blob/e62b29f2/sql/core/src/test/resources/sql-tests/inputs/null-propagation.sql
----------------------------------------------------------------------
diff --git a/sql/core/src/test/resources/sql-tests/inputs/null-propagation.sql b/sql/core/src/test/resources/sql-tests/inputs/null-propagation.sql
new file mode 100644
index 0000000..66549da
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/null-propagation.sql
@@ -0,0 +1,9 @@
+
+-- count(null) should be 0
+SELECT COUNT(NULL) FROM VALUES 1, 2, 3;
+SELECT COUNT(1 + NULL) FROM VALUES 1, 2, 3;
+
+-- count(null) on window should be 0
+SELECT COUNT(NULL) OVER () FROM VALUES 1, 2, 3;
+SELECT COUNT(1 + NULL) OVER () FROM VALUES 1, 2, 3;
+

http://git-wip-us.apache.org/repos/asf/spark/blob/e62b29f2/sql/core/src/test/resources/sql-tests/results/null-propagation.sql.out
----------------------------------------------------------------------
diff --git a/sql/core/src/test/resources/sql-tests/results/null-propagation.sql.out b/sql/core/src/test/resources/sql-tests/results/null-propagation.sql.out
new file mode 100644
index 0000000..ed3a651
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/results/null-propagation.sql.out
@@ -0,0 +1,38 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 4
+
+
+-- !query 0
+SELECT COUNT(NULL) FROM VALUES 1, 2, 3
+-- !query 0 schema
+struct<count(NULL):bigint>
+-- !query 0 output
+0
+
+
+-- !query 1
+SELECT COUNT(1 + NULL) FROM VALUES 1, 2, 3
+-- !query 1 schema
+struct<count((1 + CAST(NULL AS INT))):bigint>
+-- !query 1 output
+0
+
+
+-- !query 2
+SELECT COUNT(NULL) OVER () FROM VALUES 1, 2, 3
+-- !query 2 schema
+struct<count(NULL) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING):bigint>
+-- !query 2 output
+0
+0
+0
+
+
+-- !query 3
+SELECT COUNT(1 + NULL) OVER () FROM VALUES 1, 2, 3
+-- !query 3 schema
+struct<count((1 + CAST(NULL AS INT))) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED
FOLLOWING):bigint>
+-- !query 3 output
+0
+0
+0


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message