spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yhuai <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-9251][SQL] do not order by expressions ...
Date Fri, 31 Jul 2015 23:43:23 GMT
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7593#discussion_r36026215
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
    @@ -947,6 +948,63 @@ class Analyzer(
             Project(p.output, newPlan.withNewChildren(newChild :: Nil))
         }
       }
    +
    +  /**
    +   * Removes all still-need-evaluate ordering expressions from sort and use an inner
project to
    +   * materialize them, finally use a outer project to project them away to keep the result
same.
    +   * Then we can make sure we only sort by [[AttributeReference]]s.
    +   *
    +   * As an example,
    +   * {{{
    +   *   Sort('a, 'b + 1,
    +   *     Relation('a, 'b))
    +   * }}}
    +   * will be turned into:
    +   * {{{
    +   *   Project('a, 'b,
    +   *     Sort('a, '_sortCondition,
    +   *       Project('a, 'b, ('b + 1).as("_sortCondition"),
    +   *         Relation('a, 'b))))
    +   * }}}
    +   */
    +  object RemoveEvaluationFromSort extends Rule[LogicalPlan] {
    +    private def hasAlias(expr: Expression) = {
    +      expr.find {
    +        case a: Alias => true
    +        case _ => false
    +      }.isDefined
    +    }
    +
    +    override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      // The ordering expressions have no effect to the output schema of `Sort`,
    +      // so `Alias`s in ordering expressions are unnecessary and we should remove them.
    +      case s @ Sort(ordering, _, _) if ordering.exists(hasAlias) =>
    +        val newOrdering = ordering.map(_.transformUp {
    +          case Alias(child, _) => child
    +        }.asInstanceOf[SortOrder])
    +        s.copy(order = newOrdering)
    +
    +      case s @ Sort(ordering, global, child)
    +        if s.expressions.forall(_.resolved) && s.childrenResolved &&
!s.hasNoEvaluation =>
    +
    +        val (ref, needEval) = ordering.partition(_.child.isInstanceOf[AttributeReference])
    +
    +        val namedExpr = needEval.map(_.child match {
    +          case n: NamedExpression => n
    +          case e => Alias(e, "_sortCondition")()
    +        })
    +
    +        val newOrdering = ref ++ needEval.zip(namedExpr).map { case (order, ne) =>
    --- End diff --
    
    o, it ha not been fixed yet. Will fix it before 1.5 release.
    
    
    
        _____________________________
    From: Wenchen Fan <notifications@github.com>
    Sent: Friday, July 31, 2015 4:08 PM
    Subject: Re: [spark] [SPARK-9251][SQL] do not order by expressions which still need evaluation
(#7593)
    To: apache/spark <spark@noreply.github.com>
    Cc: Yin Huai <yhuai@databricks.com>
    
    
        
    
    In sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
  > +        val newOrdering = ordering.map(_.transformUp {> +          case Alias(child,
_) => child> +        }.asInstanceOf[SortOrder])> +        s.copy(order = newOrdering)>
+> +      case s @ Sort(ordering, global, child)> +        if s.expressions.forall(_.resolved)
&& s.childrenResolved && !s.hasNoEvaluation =>> +> +        val (ref,
needEval) = ordering.partition(_.child.isInstanceOf[AttributeReference])> +> +     
  val namedExpr = needEval.map(_.child match {> +          case n: NamedExpression =>
n> +          case e => Alias(e, "_sortCondition")()> +        })> +> +   
    val newOrdering = ref ++ needEval.zip(namedExpr).map { case (order, ne) =>   
    
    Sorry I was terribly busy these days... thanks for the fix!   
    
    —
    Reply to this email directly or view it on GitHub.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message