spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] viirya commented on a change in pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId
Date Tue, 01 Sep 2020 06:08:22 GMT

viirya commented on a change in pull request #28490:
URL: https://github.com/apache/spark/pull/28490#discussion_r480844760



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1325,24 +1325,43 @@ class Analyzer(
      *
      * Note : In this routine, the unresolved attributes are resolved from the input plan's
      * children attributes.
+     *
+     * @param e the expression need to be resolved.
+     * @param q the LogicalPlan use to resolve expression's attribute from.
+     * @param trimAlias whether need to trim alias of Struct field. When true, we will trim
+     *                  Struct field alias. When isTopLevel = true, we won't trim top-level
+     *                  Struct field alias.
+     * @param isTopLevel whether need to trim top-level alias of Struct field. this param
is

Review comment:
       nit: this -> This

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1325,24 +1325,43 @@ class Analyzer(
      *
      * Note : In this routine, the unresolved attributes are resolved from the input plan's
      * children attributes.
+     *
+     * @param e the expression need to be resolved.
+     * @param q the LogicalPlan use to resolve expression's attribute from.

Review comment:
       nit: use -> used

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1325,24 +1325,43 @@ class Analyzer(
      *
      * Note : In this routine, the unresolved attributes are resolved from the input plan's
      * children attributes.
+     *
+     * @param e the expression need to be resolved.
+     * @param q the LogicalPlan use to resolve expression's attribute from.
+     * @param trimAlias whether need to trim alias of Struct field. When true, we will trim
+     *                  Struct field alias. When isTopLevel = true, we won't trim top-level
+     *                  Struct field alias.
+     * @param isTopLevel whether need to trim top-level alias of Struct field. this param
is
+     *                   controlled by this method itself to make sure we won't trim top-level
+     *                   Struct field alias. If need to trim top-level Struct field alias,
+     *                   we can do that outside of this method.

Review comment:
       Can we rephase this param doc too? Do you mean, this param is used by this method to
know whether it is resolving top-level expression or not, if it is top-level, we skip trimming
alias of struct field.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1325,24 +1325,43 @@ class Analyzer(
      *
      * Note : In this routine, the unresolved attributes are resolved from the input plan's
      * children attributes.
+     *
+     * @param e the expression need to be resolved.
+     * @param q the LogicalPlan use to resolve expression's attribute from.
+     * @param trimAlias whether need to trim alias of Struct field. When true, we will trim
+     *                  Struct field alias. When isTopLevel = true, we won't trim top-level
+     *                  Struct field alias.

Review comment:
       This param doc reads weird. Do you mean, when `trimAlias` is true, the method will
trim alias of a struct field. But this method won't trim alias if it is top-level expression?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1428,8 +1428,46 @@ class Analyzer(
       // SPARK-25942: Resolves aggregate expressions with `AppendColumns`'s children, instead
of
       // `AppendColumns`, because `AppendColumns`'s serializer might produce conflict attribute
       // names leading to ambiguous references exception.
-      case a @ Aggregate(groupingExprs, aggExprs, appendColumns: AppendColumns) =>
-        a.mapExpressions(resolveExpressionTopDown(_, appendColumns))
+      case a: Aggregate =>
+        val planForResolve = a.child match {
+          case appendColumns: AppendColumns => appendColumns
+          case _ => a
+        }
+
+        val resolvedGroupingExprs =
+          a.groupingExpressions.map(resolveExpressionTopDown(_, planForResolve))
+            .map(trimStructFieldAlias)
+
+        val resolvedAggExprs = a.aggregateExpressions
+          .map(resolveExpressionTopDown(_, planForResolve))
+          .map {

Review comment:
       Hmm, where is `trimNonTopLevelStructFieldAlias`?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1425,11 +1444,48 @@ class Analyzer(
       // rule: ResolveDeserializer.
       case plan if containsDeserializer(plan.expressions) => plan
 
-      // SPARK-25942: Resolves aggregate expressions with `AppendColumns`'s children, instead
of
-      // `AppendColumns`, because `AppendColumns`'s serializer might produce conflict attribute
-      // names leading to ambiguous references exception.
-      case a @ Aggregate(groupingExprs, aggExprs, appendColumns: AppendColumns) =>
-        a.mapExpressions(resolveExpressionTopDown(_, appendColumns))
+      case a: Aggregate =>
+        val planForResolve = a.child match {
+          case appendColumns: AppendColumns => appendColumns
+          case _ => a
+        }
+
+        val resolvedGroupingExprs = a.groupingExpressions
+          .map(resolveExpressionTopDown(_, planForResolve, trimAlias = true))
+          .map {
+            // trim Alias over top-level GetStructField
+            case Alias(s: GetStructField, _) => s
+            case other => other
+          }
+
+        val resolvedAggExprs = a.aggregateExpressions
+          .map(resolveExpressionTopDown(_, planForResolve, trimAlias = true))
+            .map(_.asInstanceOf[NamedExpression])
+
+        a.copy(resolvedGroupingExprs, resolvedAggExprs, a.child)
+
+      case g: GroupingSets =>
+        val resolvedSelectedExprs = g.selectedGroupByExprs
+          .map(_.map(resolveExpressionTopDown(_, g, trimAlias = true))
+            .map {
+              // trim Alias over top-level GetStructField
+              case Alias(s: GetStructField, _) => s
+              case other => other
+            })

Review comment:
       This is somehow hard to understand for reader. Why we need to trim alias for these
expressions? Can you explain or maybe add an example as comment in the code?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message