Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7CE0C200B33 for ; Wed, 29 Jun 2016 13:09:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7B681160A57; Wed, 29 Jun 2016 11:09:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9C352160A4D for ; Wed, 29 Jun 2016 13:09:10 +0200 (CEST) Received: (qmail 7052 invoked by uid 500); 29 Jun 2016 11:09:09 -0000 Mailing-List: contact commits-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list commits@spark.apache.org Received: (qmail 7043 invoked by uid 99); 29 Jun 2016 11:09:09 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jun 2016 11:09:09 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id A9B26E5CE1; Wed, 29 Jun 2016 11:09:09 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: wenchen@apache.org To: commits@spark.apache.org Message-Id: X-Mailer: ASF-Git Admin Mailer Subject: spark git commit: [SPARK-16291][SQL] CheckAnalysis should capture nested aggregate functions that reference no input attributes Date: Wed, 29 Jun 2016 11:09:09 +0000 (UTC) archived-at: Wed, 29 Jun 2016 11:09:11 -0000 Repository: spark Updated Branches: refs/heads/branch-2.0 904122335 -> 1b4d63f6f [SPARK-16291][SQL] CheckAnalysis should capture nested aggregate functions that reference no input attributes ## What changes were proposed in this pull request? `MAX(COUNT(*))` is invalid since aggregate expression can't be nested within another aggregate expression. This case should be captured at analysis phase, but somehow sneaks off to runtime. The reason is that when checking aggregate expressions in `CheckAnalysis`, a checking branch treats all expressions that reference no input attributes as valid ones. However, `MAX(COUNT(*))` is translated into `MAX(COUNT(1))` at analysis phase and also references no input attribute. This PR fixes this issue by removing the aforementioned branch. ## How was this patch tested? New test case added in `AnalysisErrorSuite`. Author: Cheng Lian Closes #13968 from liancheng/spark-16291-nested-agg-functions. (cherry picked from commit d1e8108854deba3de8e2d87eb4389d11fb17ee57) Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1b4d63f6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1b4d63f6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1b4d63f6 Branch: refs/heads/branch-2.0 Commit: 1b4d63f6f1e9f5aa819a149e1f5e45bba7d865bb Parents: 9041223 Author: Cheng Lian Authored: Wed Jun 29 19:08:36 2016 +0800 Committer: Wenchen Fan Committed: Wed Jun 29 19:09:00 2016 +0800 ---------------------------------------------------------------------- .../spark/sql/catalyst/analysis/CheckAnalysis.scala | 1 - .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 12 +++++++++++- .../test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 4 +--- 3 files changed, 12 insertions(+), 5 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/1b4d63f6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---------------------------------------------------------------------- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index ac9693e..7b30fcc 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -206,7 +206,6 @@ trait CheckAnalysis extends PredicateHelper { "Add to group by or wrap in first() (or first_value) if you don't care " + "which value you get.") case e if groupingExprs.exists(_.semanticEquals(e)) => // OK - case e if e.references.isEmpty => // OK case e => e.children.foreach(checkValidAggregateExpression) } http://git-wip-us.apache.org/repos/asf/spark/blob/1b4d63f6/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ---------------------------------------------------------------------- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala index a41383f..a9cde1e 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala @@ -23,7 +23,7 @@ import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.catalyst.dsl.expressions._ import org.apache.spark.sql.catalyst.dsl.plans._ import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, Complete, Count} +import org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, Complete, Count, Max} import org.apache.spark.sql.catalyst.plans.{Inner, LeftOuter, RightOuter} import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData, MapData} @@ -163,6 +163,16 @@ class AnalysisErrorSuite extends AnalysisTest { "Distinct window functions are not supported" :: Nil) errorTest( + "nested aggregate functions", + testRelation.groupBy('a)( + AggregateExpression( + Max(AggregateExpression(Count(Literal(1)), Complete, isDistinct = false)), + Complete, + isDistinct = false)), + "not allowed to use an aggregate function in the argument of another aggregate function." :: Nil + ) + + errorTest( "offset window function", testRelation2.select( WindowExpression( http://git-wip-us.apache.org/repos/asf/spark/blob/1b4d63f6/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---------------------------------------------------------------------- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index b1dbf21..084ba9b 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -21,10 +21,8 @@ import java.math.MathContext import java.sql.Timestamp import org.apache.spark.AccumulatorSuite -import org.apache.spark.sql.catalyst.FunctionIdentifier import org.apache.spark.sql.catalyst.analysis.UnresolvedException -import org.apache.spark.sql.catalyst.catalog.{CatalogTestUtils, ExternalCatalog, SessionCatalog} -import org.apache.spark.sql.catalyst.expressions.{ExpressionDescription, SortOrder} +import org.apache.spark.sql.catalyst.expressions.SortOrder import org.apache.spark.sql.catalyst.plans.logical.Aggregate import org.apache.spark.sql.catalyst.util.StringUtils import org.apache.spark.sql.execution.aggregate --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org For additional commands, e-mail: commits-help@spark.apache.org