spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenchen Fan (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-22266) The same aggregate function was evaluated multiple times
Date Wed, 18 Oct 2017 13:15:01 GMT

     [ https://issues.apache.org/jira/browse/SPARK-22266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wenchen Fan reassigned SPARK-22266:
-----------------------------------

    Assignee: Maryann Xue

> The same aggregate function was evaluated multiple times
> --------------------------------------------------------
>
>                 Key: SPARK-22266
>                 URL: https://issues.apache.org/jira/browse/SPARK-22266
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Maryann Xue
>            Assignee: Maryann Xue
>            Priority: Minor
>             Fix For: 2.3.0
>
>
> We should avoid the same aggregate function being evaluated more than once, and this
is what has been stated in the code comment below (patterns.scala:206). However things didn't
work as expected.
> {code}
>       // A single aggregate expression might appear multiple times in resultExpressions.
>       // In order to avoid evaluating an individual aggregate function multiple times,
we'll
>       // build a set of the distinct aggregate expressions and build a function which
can
>       // be used to re-write expressions so that they reference the single copy of the
>       // aggregate function which actually gets computed.
> {code}
> For example, the physical plan of
> {code}
> SELECT a, max(b+1), max(b+1) + 1 FROM testData2 GROUP BY a
> {code}
> was
> {code}
> HashAggregate(keys=[a#23], functions=[max((b#24 + 1)), max((b#24 + 1))], output=[a#23,
max((b + 1))#223, (max((b + 1)) + 1)#224])
> +- HashAggregate(keys=[a#23], functions=[partial_max((b#24 + 1)), partial_max((b#24 +
1))], output=[a#23, max#231, max#232])
>    +- SerializeFromObject [assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2,
true]).a AS a#23, assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2,
true]).b AS b#24]
>       +- Scan ExternalRDDScan[obj#22]
> {code}
> , where in each HashAggregate there were two identical aggregate functions "max(b#24
+ 1)".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message