From issues-return-212340-archive-asf-public=cust-asf.ponee.io@spark.apache.org Sat Dec 22 19:39:06 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id D31B1180675 for ; Sat, 22 Dec 2018 19:39:05 +0100 (CET) Received: (qmail 49122 invoked by uid 500); 22 Dec 2018 18:39:05 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 49113 invoked by uid 99); 22 Dec 2018 18:39:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Dec 2018 18:39:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9D6B8C230E for ; Sat, 22 Dec 2018 18:39:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Q5tjee2aeVEz for ; Sat, 22 Dec 2018 18:39:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 81E386110B for ; Sat, 22 Dec 2018 18:39:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 885D0E1013 for ; Sat, 22 Dec 2018 18:39:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2C29925341 for ; Sat, 22 Dec 2018 18:39:00 +0000 (UTC) Date: Sat, 22 Dec 2018 18:39:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-26402) Accessing nested fields with different cases in case insensitive mode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-26402?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D167= 27529#comment-16727529 ]=20 ASF GitHub Bot commented on SPARK-26402: ---------------------------------------- asfgit closed pull request #23353: [SPARK-26402][SQL] Accessing nested fiel= ds with different cases in case insensitive mode URL: https://github.com/apache/spark/pull/23353 =20 =20 =20 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expr= essions/Canonicalize.scala b/sql/catalyst/src/main/scala/org/apache/spark/s= ql/catalyst/expressions/Canonicalize.scala index fe6db8b344d3d..4d218b936b3a2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions= /Canonicalize.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions= /Canonicalize.scala @@ -26,6 +26,7 @@ package org.apache.spark.sql.catalyst.expressions * * The following rules are applied: * - Names and nullability hints for [[org.apache.spark.sql.types.DataTyp= e]]s are stripped. + * - Names for [[GetStructField]] are stripped. * - Commutative and associative operations ([[Add]] and [[Multiply]]) ha= ve their children ordered * by `hashCode`. * - [[EqualTo]] and [[EqualNullSafe]] are reordered by `hashCode`. @@ -37,10 +38,11 @@ object Canonicalize { expressionReorder(ignoreNamesTypes(e)) } =20 - /** Remove names and nullability from types. */ + /** Remove names and nullability from types, and names from `GetStructFi= eld`. */ private[expressions] def ignoreNamesTypes(e: Expression): Expression =3D= e match { case a: AttributeReference =3D> AttributeReference("none", a.dataType.asNullable)(exprId =3D a.exprI= d) + case GetStructField(child, ordinal, Some(_)) =3D> GetStructField(child= , ordinal, None) case _ =3D> e } =20 diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expr= essions/CanonicalizeSuite.scala b/sql/catalyst/src/test/scala/org/apache/sp= ark/sql/catalyst/expressions/CanonicalizeSuite.scala index 28e6940f3cca3..9802a6e5891b8 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions= /CanonicalizeSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions= /CanonicalizeSuite.scala @@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions import org.apache.spark.SparkFunSuite import org.apache.spark.sql.catalyst.dsl.plans._ import org.apache.spark.sql.catalyst.plans.logical.Range +import org.apache.spark.sql.types.{IntegerType, StructField, StructType} =20 class CanonicalizeSuite extends SparkFunSuite { =20 @@ -50,4 +51,32 @@ class CanonicalizeSuite extends SparkFunSuite { assert(range.where(arrays1).sameResult(range.where(arrays2))) assert(!range.where(arrays1).sameResult(range.where(arrays3))) } + + test("SPARK-26402: accessing nested fields with different cases in case = insensitive mode") { + val expId =3D NamedExpression.newExprId + val qualifier =3D Seq.empty[String] + val structType =3D StructType( + StructField("a", StructType(StructField("b", IntegerType, false) :: = Nil), false) :: Nil) + + // GetStructField with different names are semantically equal + val fieldA1 =3D GetStructField( + AttributeReference("data1", structType, false)(expId, qualifier), + 0, Some("a1")) + val fieldA2 =3D GetStructField( + AttributeReference("data2", structType, false)(expId, qualifier), + 0, Some("a2")) + assert(fieldA1.semanticEquals(fieldA2)) + + val fieldB1 =3D GetStructField( + GetStructField( + AttributeReference("data1", structType, false)(expId, qualifier), + 0, Some("a1")), + 0, Some("b1")) + val fieldB2 =3D GetStructField( + GetStructField( + AttributeReference("data2", structType, false)(expId, qualifier), + 0, Some("a2")), + 0, Some("b2")) + assert(fieldB1.semanticEquals(fieldB2)) + } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/opti= mizer/BinaryComparisonSimplificationSuite.scala b/sql/catalyst/src/test/sca= la/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSu= ite.scala index a313681eeb8f0..5794691a365a9 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/B= inaryComparisonSimplificationSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/B= inaryComparisonSimplificationSuite.scala @@ -25,6 +25,7 @@ import org.apache.spark.sql.catalyst.expressions.Literal.= {FalseLiteral, TrueLite import org.apache.spark.sql.catalyst.plans.PlanTest import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.rules._ +import org.apache.spark.sql.types.{IntegerType, StructField, StructType} =20 class BinaryComparisonSimplificationSuite extends PlanTest with PredicateH= elper { =20 @@ -92,4 +93,33 @@ class BinaryComparisonSimplificationSuite extends PlanTe= st with PredicateHelper val correctAnswer =3D nonNullableRelation.analyze comparePlans(actual, correctAnswer) } + + test("SPARK-26402: accessing nested fields with different cases in case = insensitive mode") { + val expId =3D NamedExpression.newExprId + val qualifier =3D Seq.empty[String] + val structType =3D StructType( + StructField("a", StructType(StructField("b", IntegerType, false) :: = Nil), false) :: Nil) + + val fieldA1 =3D GetStructField( + GetStructField( + AttributeReference("data1", structType, false)(expId, qualifier), + 0, Some("a1")), + 0, Some("b1")) + val fieldA2 =3D GetStructField( + GetStructField( + AttributeReference("data2", structType, false)(expId, qualifier), + 0, Some("a2")), + 0, Some("b2")) + + // GetStructField with different names are semantically equal; thus, `= EqualTo(fieldA1, fieldA2)` + // will be optimized to `TrueLiteral` by `SimplifyBinaryComparison`. + val originalQuery =3D nonNullableRelation + .where(EqualTo(fieldA1, fieldA2)) + .analyze + + val optimized =3D Optimize.execute(originalQuery) + val correctAnswer =3D nonNullableRelation.analyze + + comparePlans(optimized, correctAnswer) + } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.sca= la b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index 37a8815350a53..656da9fa01806 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -2937,6 +2937,25 @@ class SQLQuerySuite extends QueryTest with SharedSQL= Context { } } } + + test("SPARK-26402: accessing nested fields with different cases in case = insensitive mode") { + withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { + val msg =3D intercept[AnalysisException] { + withTable("t") { + sql("create table t (s struct) using json") + checkAnswer(sql("select s.I from t group by s.i"), Nil) + } + }.message + assert(msg.contains("No such struct field I in i")) + } + + withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + withTable("t") { + sql("create table t (s struct) using json") + checkAnswer(sql("select s.I from t group by s.i"), Nil) + } + } + } } =20 case class Foo(bar: Option[String]) =20 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. =20 For queries about this service, please contact Infrastructure at: users@infra.apache.org > Accessing nested fields with different cases in case insensitive mode > --------------------------------------------------------------------- > > Key: SPARK-26402 > URL: https://issues.apache.org/jira/browse/SPARK-26402 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.4.0 > Reporter: DB Tsai > Assignee: DB Tsai > Priority: Major > > {{GetStructField}} with different optional names should be semantically e= qual. We will use this as building block to compare the nested fields used = in the plans to be optimized by catalyst optimizer. > This PR also fixes a bug below that accessing nested fields with differen= t cases in case insensitive mode will result=C2=A0result {{AnalysisExceptio= n}}. > {code:java} > sql("create table t (s struct) using json") > sql("select s.I from t group by s.i") > {code} > which is currently failing > {code:java} > org.apache.spark.sql.AnalysisException: expression 'default.t.`s`' is nei= ther present in the group by, nor is it an aggregate function > {code} > =C2=A0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org