Return-Path: X-Original-To: apmail-spark-reviews-archive@minotaur.apache.org Delivered-To: apmail-spark-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 09E7D18A4E for ; Tue, 15 Mar 2016 06:33:59 +0000 (UTC) Received: (qmail 23250 invoked by uid 500); 15 Mar 2016 06:33:58 -0000 Delivered-To: apmail-spark-reviews-archive@spark.apache.org Received: (qmail 23224 invoked by uid 500); 15 Mar 2016 06:33:58 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 23213 invoked by uid 99); 15 Mar 2016 06:33:58 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Mar 2016 06:33:58 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 71C4CE098D; Tue, 15 Mar 2016 06:33:58 +0000 (UTC) From: rxin To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request: [SPARK-12719][SQL] SQL generation support for ... Content-Type: text/plain Message-Id: <20160315063358.71C4CE098D@git1-us-west.apache.org> Date: Tue, 15 Mar 2016 06:33:58 +0000 (UTC) Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11696#discussion_r56120392 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala --- @@ -316,31 +345,73 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi // `Aggregate`s. CollapseProject), Batch("Recover Scoping Info", Once, - // Used to handle other auxiliary `Project`s added by analyzer (e.g. - // `ResolveAggregateFunctions` rule) - AddSubquery, - // Previous rule will add extra sub-queries, this rule is used to re-propagate and update - // the qualifiers bottom up, e.g.: - // - // Sort - // ordering = t1.a - // Project - // projectList = [t1.a, t1.b] - // Subquery gen_subquery - // child ... - // - // will be transformed to: - // - // Sort - // ordering = gen_subquery.a - // Project - // projectList = [gen_subquery.a, gen_subquery.b] - // Subquery gen_subquery - // child ... - UpdateQualifiers + // Remove all sub queries, as we will insert new ones when it's necessary. + EliminateSubqueryAliases, + // A logical plan is allowed to have same-name outputs with different qualifiers(e.g. the + // `Join` operator). However, this kind of plan can't be put under a sub query as we will + // erase and assign a new qualifier to all outputs and make it impossible to distinguish + // same-name outputs. This rule renames all attributes, to guarantee different + // attributes(with different exprId) always have different names. It also removes all + // qualifiers, as attributes have unique names now and we don't need qualifiers to resolve + // ambiguity. + NormalizedAttribute, + // Wraps table information with SQLTable, and combine `Sample` operator if there are any. + ResolveSQLTable, + // Re-order operators to let them generate legal SQL string, e.g. we should push down + // `Generate` through `Filter`. We should enrich this rule in the future. + ReOrderOperators, --- End diff -- let's put your comment actually in the code so we know why we need to re-order the operators --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org