spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mgaido91 <...@git.apache.org>
Subject [GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...
Date Sat, 20 Jan 2018 21:20:45 GMT
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20333#discussion_r162793942
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
    @@ -1108,15 +1108,19 @@ object CheckCartesianProducts extends Rule[LogicalPlan] with PredicateHelper
{
        */
       def isCartesianProduct(join: Join): Boolean = {
         val conditions = join.condition.map(splitConjunctivePredicates).getOrElse(Nil)
    -    !conditions.map(_.references).exists(refs => refs.exists(join.left.outputSet.contains)
    -        && refs.exists(join.right.outputSet.contains))
    +
    +    conditions match {
    +      case Seq(Literal.FalseLiteral) | Seq(Literal(null, BooleanType)) => false
    +      case _ => !conditions.map(_.references).exists(refs =>
    +        refs.exists(join.left.outputSet.contains) && refs.exists(join.right.outputSet.contains))
    +    }
       }
     
       def apply(plan: LogicalPlan): LogicalPlan =
         if (SQLConf.get.crossJoinEnabled) {
           plan
         } else plan transform {
    -      case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, condition)
    +      case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, _)
    --- End diff --
    
    why are you saying that the size of the result set is the same?
    If you have a relation A (of size n, let's say 1M rows) in outer join with a relation
B (of size m, let's say 1M rows). If the condition is true, the output relation is 1M * 1M
(ie. (n * m)); if the condition is false, the result is 1M (n) for a left join, 1M (m) for
a right join, 1M + 1M (m +n) for a full outer join. Therefore the size is not the same at
all. But maybe you meant something different, am I missing something?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message