spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cloud-fan <...@git.apache.org>
Subject [GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...
Date Fri, 13 Oct 2017 07:37:08 GMT
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r144483555
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
---
    @@ -205,14 +205,15 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
           // A single aggregate expression might appear multiple times in resultExpressions.
           // In order to avoid evaluating an individual aggregate function multiple times,
we'll
    -      // build a set of the distinct aggregate expressions and build a function which
can
    +      // build a map of the distinct aggregate expressions and build a function which
can
           // be used to re-write expressions so that they reference the single copy of the
           // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      val aggregateExpressionMap = resultExpressions.flatMap { expr =>
             expr.collect {
    -          case agg: AggregateExpression => agg
    +          case agg: AggregateExpression => (agg.canonicalized, agg.deterministic)
-> agg
    --- End diff --
    
    I think non-deterministic functions should not be deduplicated, e.g. `select max(a + rand()),
max(a + rand()) from ...` should still eveluate 2 aggregate funcitions.
    
    my suggestion:
    ```
    val aggregateExpressions = resultExpressions.flatMap { expr =>
      expr.collect {
        case agg: AggregateExpression => agg
      }
    }
    val aggregateExpressionMap = aggregateExpressions.filter(_.deterministic).map { agg =>
      agg.canonicalized -> agg
    }.toMap
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message