spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yh...@apache.org
Subject spark git commit: [SPARK-16181][SQL] outer join with isNull filter may return wrong result
Date Tue, 28 Jun 2016 17:26:30 GMT
Repository: spark
Updated Branches:
  refs/heads/master 0923c4f56 -> 1f2776df6


[SPARK-16181][SQL] outer join with isNull filter may return wrong result

## What changes were proposed in this pull request?

The root cause is: the output attributes of outer join are derived from its children, while
they are actually different attributes(outer join can return null).

We have already added some special logic to handle it, e.g. `PushPredicateThroughJoin` won't
push down predicates through outer join side, `FixNullability`.

This PR adds one more special logic in `FoldablePropagation`.

## How was this patch tested?

new test in `DataFrameSuite`

Author: Wenchen Fan <wenchen@databricks.com>

Closes #13884 from cloud-fan/bug.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1f2776df
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1f2776df
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1f2776df

Branch: refs/heads/master
Commit: 1f2776df6e87a84991537ac20e4b8829472d3462
Parents: 0923c4f
Author: Wenchen Fan <wenchen@databricks.com>
Authored: Tue Jun 28 10:26:01 2016 -0700
Committer: Yin Huai <yhuai@databricks.com>
Committed: Tue Jun 28 10:26:01 2016 -0700

----------------------------------------------------------------------
 .../org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 8 ++++++++
 .../test/scala/org/apache/spark/sql/DataFrameSuite.scala    | 9 +++++++++
 2 files changed, 17 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/1f2776df/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
----------------------------------------------------------------------
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 2bca31d..9bc8cea 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -688,6 +688,14 @@ object FoldablePropagation extends Rule[LogicalPlan] {
         case c: Command =>
           stop = true
           c
+        // For outer join, although its output attributes are derived from its children,
they are
+        // actually different attributes: the output of outer join is not always picked from
its
+        // children, but can also be null.
+        // TODO(cloud-fan): It seems more reasonable to use new attributes as the output
attributes
+        // of outer join.
+        case j @ Join(_, _, LeftOuter | RightOuter | FullOuter, _) =>
+          stop = true
+          j
         case p: LogicalPlan if !stop => p.transformExpressions {
           case a: AttributeReference if foldableMap.contains(a) =>
             foldableMap(a)

http://git-wip-us.apache.org/repos/asf/spark/blob/1f2776df/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
index 6a0a7df..9d53be8 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
@@ -1562,4 +1562,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
     val df = Seq(1, 1, 2).toDF("column.with.dot")
     checkAnswer(df.distinct(), Row(1) :: Row(2) :: Nil)
   }
+
+  test("SPARK-16181: outer join with isNull filter") {
+    val left = Seq("x").toDF("col")
+    val right = Seq("y").toDF("col").withColumn("new", lit(true))
+    val joined = left.join(right, left("col") === right("col"), "left_outer")
+
+    checkAnswer(joined, Row("x", null, null))
+    checkAnswer(joined.filter($"new".isNull), Row("x", null, null))
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message