spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From r...@apache.org
Subject spark git commit: [SPARK-17228][SQL] Not infer/propagate non-deterministic constraints
Date Thu, 25 Aug 2016 04:24:27 GMT
Repository: spark
Updated Branches:
  refs/heads/master 3a60be4b1 -> ac27557eb


[SPARK-17228][SQL] Not infer/propagate non-deterministic constraints

## What changes were proposed in this pull request?

Given that filters based on non-deterministic constraints shouldn't be pushed down in the
query plan, unnecessarily inferring them is confusing and a source of potential bugs. This
patch simplifies the inferring logic by simply ignoring them.

## How was this patch tested?

Added a new test in `ConstraintPropagationSuite`.

Author: Sameer Agarwal <sameerag@cs.berkeley.edu>

Closes #14795 from sameeragarwal/deterministic-constraints.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ac27557e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ac27557e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ac27557e

Branch: refs/heads/master
Commit: ac27557eb622a257abeb3e8551f06ebc72f87133
Parents: 3a60be4
Author: Sameer Agarwal <sameerag@cs.berkeley.edu>
Authored: Wed Aug 24 21:24:24 2016 -0700
Committer: Reynold Xin <rxin@databricks.com>
Committed: Wed Aug 24 21:24:24 2016 -0700

----------------------------------------------------------------------
 .../spark/sql/catalyst/plans/QueryPlan.scala       |  3 ++-
 .../plans/ConstraintPropagationSuite.scala         | 17 +++++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ac27557e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
----------------------------------------------------------------------
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
index 8ee31f4..0fb6e7d 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
@@ -35,7 +35,8 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
       .union(inferAdditionalConstraints(constraints))
       .union(constructIsNotNullConstraints(constraints))
       .filter(constraint =>
-        constraint.references.nonEmpty && constraint.references.subsetOf(outputSet))
+        constraint.references.nonEmpty && constraint.references.subsetOf(outputSet)
&&
+          constraint.deterministic)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/ac27557e/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
----------------------------------------------------------------------
diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
index 5a76969..8d6a49a 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
@@ -352,4 +352,21 @@ class ConstraintPropagationSuite extends SparkFunSuite {
     verifyConstraints(tr.analyze.constraints,
       ExpressionSet(Seq(IsNotNull(resolveColumn(tr, "b")), IsNotNull(resolveColumn(tr, "c")))))
   }
+
+  test("not infer non-deterministic constraints") {
+    val tr = LocalRelation('a.int, 'b.string, 'c.int)
+
+    verifyConstraints(tr
+      .where('a.attr === Rand(0))
+      .analyze.constraints,
+      ExpressionSet(Seq(IsNotNull(resolveColumn(tr, "a")))))
+
+    verifyConstraints(tr
+      .where('a.attr === InputFileName())
+      .where('a.attr =!= 'c.attr)
+      .analyze.constraints,
+      ExpressionSet(Seq(resolveColumn(tr, "a") =!= resolveColumn(tr, "c"),
+        IsNotNull(resolveColumn(tr, "a")),
+        IsNotNull(resolveColumn(tr, "c")))))
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message