spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hvanhov...@apache.org
Subject spark git commit: [SPARK-19509][SQL] Grouping Sets do not respect nullable grouping columns
Date Thu, 09 Feb 2017 20:01:51 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 8bf642226 -> 00803cdb4


[SPARK-19509][SQL] Grouping Sets do not respect nullable grouping columns

## What changes were proposed in this pull request?
The analyzer currently does not check if a column used in grouping sets is actually nullable
itself. This can cause the nullability of the column to be incorrect, which can cause null
pointer exceptions down the line. This PR fixes that by also consider the nullability of the
column.

This is only a problem for Spark 2.1 and below. The latest master uses a different approach.

Closes https://github.com/apache/spark/pull/16874

## How was this patch tested?
Added a regression test to `SQLQueryTestSuite.grouping_set`.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes #16873 from hvanhovell/SPARK-19509.

(cherry picked from commit a3d5300a030fb5f1c275e671603e0745b6466735)
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/00803cdb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/00803cdb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/00803cdb

Branch: refs/heads/branch-2.0
Commit: 00803cdb4ea242d57fd601934e8241a1fa4a323d
Parents: 8bf6422
Author: Stan Zhai <mail@zhaishidan.cn>
Authored: Thu Feb 9 21:01:25 2017 +0100
Committer: Herman van Hovell <hvanhovell@databricks.com>
Committed: Thu Feb 9 21:01:45 2017 +0100

----------------------------------------------------------------------
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  3 +-
 .../resources/sql-tests/inputs/grouping_set.sql | 12 ++++-
 .../sql-tests/results/grouping_set.sql.out      | 53 ++++++++++++++++----
 3 files changed, 56 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/00803cdb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
----------------------------------------------------------------------
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 9e5ea41..7ac6229 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -313,7 +313,8 @@ class Analyzer(
 
         val attrLength = groupByAliases.length
         val expandedAttributes = groupByAliases.zipWithIndex.map { case (a, idx) =>
-          a.toAttribute.withNullability(((nullBitmask >> (attrLength - idx - 1)) &
1) == 1)
+          val canBeNull = ((nullBitmask >> (attrLength - idx - 1)) & 1) == 1
+          a.toAttribute.withNullability(a.nullable || canBeNull)
         }
 
         val expand = Expand(x.bitmasks, groupByAliases, expandedAttributes, gid, x.child)

http://git-wip-us.apache.org/repos/asf/spark/blob/00803cdb/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql
----------------------------------------------------------------------
diff --git a/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql b/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql
index 3594283..2b54658 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql
@@ -2,7 +2,12 @@ CREATE TEMPORARY VIEW grouping AS SELECT * FROM VALUES
   ("1", "2", "3", 1),
   ("4", "5", "6", 1),
   ("7", "8", "9", 1)
-  as grouping(a, b, c, d);
+  AS grouping(a, b, c, d);
+
+CREATE TEMPORARY VIEW grouping_null AS SELECT * FROM VALUES
+  CAST(NULL AS STRING),
+  CAST(NULL AS STRING)
+  AS T(e);
 
 -- SPARK-17849: grouping set throws NPE #1
 SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS (());
@@ -13,5 +18,8 @@ SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS ((a));
 -- SPARK-17849: grouping set throws NPE #3
 SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS ((c));
 
+-- SPARK-19509: grouping set should honor input nullability
+SELECT COUNT(1) FROM grouping_null GROUP BY e GROUPING SETS (e);
 
-
+DROP VIEW IF EXISTS grouping;
+DROP VIEW IF EXISTS grouping_null;

http://git-wip-us.apache.org/repos/asf/spark/blob/00803cdb/sql/core/src/test/resources/sql-tests/results/grouping_set.sql.out
----------------------------------------------------------------------
diff --git a/sql/core/src/test/resources/sql-tests/results/grouping_set.sql.out b/sql/core/src/test/resources/sql-tests/results/grouping_set.sql.out
index edb38a5..a9c0565 100644
--- a/sql/core/src/test/resources/sql-tests/results/grouping_set.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/grouping_set.sql.out
@@ -1,5 +1,5 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 4
+-- Number of queries: 8
 
 
 -- !query 0
@@ -7,7 +7,7 @@ CREATE TEMPORARY VIEW grouping AS SELECT * FROM VALUES
   ("1", "2", "3", 1),
   ("4", "5", "6", 1),
   ("7", "8", "9", 1)
-  as grouping(a, b, c, d)
+  AS grouping(a, b, c, d)
 -- !query 0 schema
 struct<>
 -- !query 0 output
@@ -15,28 +15,63 @@ struct<>
 
 
 -- !query 1
-SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS (())
+CREATE TEMPORARY VIEW grouping_null AS SELECT * FROM VALUES
+  CAST(NULL AS STRING),
+  CAST(NULL AS STRING)
+  AS T(e)
 -- !query 1 schema
-struct<a:string,b:string,c:string,count(d):bigint>
+struct<>
 -- !query 1 output
-NULL	NULL	NULL	3
+
 
 
 -- !query 2
-SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS ((a))
+SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS (())
 -- !query 2 schema
 struct<a:string,b:string,c:string,count(d):bigint>
 -- !query 2 output
+NULL	NULL	NULL	3
+
+
+-- !query 3
+SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS ((a))
+-- !query 3 schema
+struct<a:string,b:string,c:string,count(d):bigint>
+-- !query 3 output
 1	NULL	NULL	1
 4	NULL	NULL	1
 7	NULL	NULL	1
 
 
--- !query 3
+-- !query 4
 SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS ((c))
--- !query 3 schema
+-- !query 4 schema
 struct<a:string,b:string,c:string,count(d):bigint>
--- !query 3 output
+-- !query 4 output
 NULL	NULL	3	1
 NULL	NULL	6	1
 NULL	NULL	9	1
+
+
+-- !query 5
+SELECT COUNT(1) FROM grouping_null GROUP BY e GROUPING SETS (e)
+-- !query 5 schema
+struct<count(1):bigint>
+-- !query 5 output
+2
+
+
+-- !query 6
+DROP VIEW IF EXISTS grouping
+-- !query 6 schema
+struct<>
+-- !query 6 output
+
+
+
+-- !query 7
+DROP VIEW IF EXISTS grouping_null
+-- !query 7 schema
+struct<>
+-- !query 7 output
+


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message