Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BB1BE1853B for ; Thu, 8 Oct 2015 06:58:31 +0000 (UTC) Received: (qmail 37615 invoked by uid 500); 8 Oct 2015 06:58:26 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 37584 invoked by uid 500); 8 Oct 2015 06:58:26 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 37574 invoked by uid 99); 8 Oct 2015 06:58:26 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Oct 2015 06:58:26 +0000 Date: Thu, 8 Oct 2015 06:58:26 +0000 (UTC) From: "Reynold Xin (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (SPARK-10960) SQL with windowing function cannot reference column in inner select block MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10960: -------------------------------- Description: There seems to be a bug in the Spark SQL parser when I use windowing functions. Specifically, when the SELECT refers to a column from an inner select block, the parser throws an error. Here is an example: -------------------------- When I use a windowing function and add a '1' constant to the result, {code} select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1 {code} The Spark SQL parser works. The whole SQL is: {code} select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1, D1.c3 as c3, D1.c4 as c4, D1.c5 as c5 from (select T3671.ROW_WID as c3, T3671.CAL_MONTH as c4, T3671.CAL_YEAR as c5, 1 as c6 from W_DAY_D T3671 ) D1 {code} However, if I change the projection so that it refers to a column in an inner select block, D1.C6, whose value is itself a '1' literal, so it is functionally equivalent to the SQL above, Spark SQL will throw an error: {code} select Rank() OVER ( ORDER BY D1.c3 ) + D1.C6 as c1, D1.c3 as c3, D1.c4 as c4, D1.c5 as c5 from (select T3671.ROW_WID as c3, T3671.CAL_MONTH as c4, T3671.CAL_YEAR as c5, 1 as c6 from W_DAY_D T3671 ) D1 {code} The error message is: {code} . . . . . . . . . . . . . . . .> java.lang.NullPointerException Error: org.apache.spark.sql.AnalysisException: resolved attribute(s) c6#3386 missing from c5#3390 ,c3#3383,c4#3389,_we0#3461,c3#3388 in operator !Project [c3#3388,c4#3389,c5#3390,c3#3383,_we0#346 1,(_we0#3461 + c6#3386) AS c1#3387]; (state=,code=0) {code} The above example is a simplified version of the SQL I was testing. The full SQL I was using, which fails with a similar error, is as follows: {code} select Case when case D1.c6 when 1 then D1.c3 else NULL end is not null then Rank() OVER ( ORDER BY case when ( case D1.c6 when 1 then D1.c3 else NULL end ) is null then 1 else 0 end, case D1.c6 when 1 then D1.c3 else NULL end ) end as c1, Case when case D1.c7 when 1 then D1.c3 else NULL end is not null then Rank() OVER ( PARTITION BY D1.c4, D1.c5 ORDER BY case when ( case D1.c7 when 1 then D1.c3 else NULL end ) is null then 1 else 0 end, case D1.c7 when 1 then D1.c3 else NULL end ) end as c2, D1.c3 as c3, D1.c4 as c4, D1.c5 as c5 from (select T3671.ROW_WID as c3, T3671.CAL_MONTH as c4, T3671.CAL_YEAR as c5, ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC) as c6, ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR, T3671.ROW_WID ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC, T3671.ROW_WID DESC) as c7 from W_DAY_D T3671 ) D1 {code} Hopefully when fixed, both these sample SQLs should work! was: There seems to be a bug in the Spark SQL parser when I use windowing functions. Specifically, when the SELECT refers to a column from an inner select block, the parser throws an error. Here is an example: -------------------------- When I use a windowing function and add a '1' constant to the result, select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1 The Spark SQL parser works. The whole SQL is: select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1, D1.c3 as c3, D1.c4 as c4, D1.c5 as c5 from (select T3671.ROW_WID as c3, T3671.CAL_MONTH as c4, T3671.CAL_YEAR as c5, 1 as c6 from W_DAY_D T3671 ) D1 ------ However, if I change the projection so that it refers to a column in an inner select block, D1.C6, whose value is itself a '1' literal, so it is functionally equivalent to the SQL above, Spark SQL will throw an error: select Rank() OVER ( ORDER BY D1.c3 ) + D1.C6 as c1, D1.c3 as c3, D1.c4 as c4, D1.c5 as c5 from (select T3671.ROW_WID as c3, T3671.CAL_MONTH as c4, T3671.CAL_YEAR as c5, 1 as c6 from W_DAY_D T3671 ) D1 The error message is: . . . . . . . . . . . . . . . .> java.lang.NullPointerException Error: org.apache.spark.sql.AnalysisException: resolved attribute(s) c6#3386 missing from c5#3390 ,c3#3383,c4#3389,_we0#3461,c3#3388 in operator !Project [c3#3388,c4#3389,c5#3390,c3#3383,_we0#346 1,(_we0#3461 + c6#3386) AS c1#3387]; (state=,code=0) ---- The above example is a simplified version of the SQL I was testing. The full SQL I was using, which fails with a similar error, is as follows: select Case when case D1.c6 when 1 then D1.c3 else NULL end is not null then Rank() OVER ( ORDER BY case when ( case D1.c6 when 1 then D1.c3 else NULL end ) is null then 1 else 0 end, case D1.c6 when 1 then D1.c3 else NULL end ) end as c1, Case when case D1.c7 when 1 then D1.c3 else NULL end is not null then Rank() OVER ( PARTITION BY D1.c4, D1.c5 ORDER BY case when ( case D1.c7 when 1 then D1.c3 else NULL end ) is null then 1 else 0 end, case D1.c7 when 1 then D1.c3 else NULL end ) end as c2, D1.c3 as c3, D1.c4 as c4, D1.c5 as c5 from (select T3671.ROW_WID as c3, T3671.CAL_MONTH as c4, T3671.CAL_YEAR as c5, ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC) as c6, ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR, T3671.ROW_WID ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC, T3671.ROW_WID DESC) as c7 from W_DAY_D T3671 ) D1 ----- Hopefully when fixed, both these sample SQLs should work! > SQL with windowing function cannot reference column in inner select block > ------------------------------------------------------------------------- > > Key: SPARK-10960 > URL: https://issues.apache.org/jira/browse/SPARK-10960 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.0, 1.5.0 > Reporter: David Wong > > There seems to be a bug in the Spark SQL parser when I use windowing functions. Specifically, when the SELECT refers to a column from an inner select block, the parser throws an error. > Here is an example: > -------------------------- > When I use a windowing function and add a '1' constant to the result, > {code} > select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1 > {code} > The Spark SQL parser works. The whole SQL is: > {code} > select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1, > D1.c3 as c3, > D1.c4 as c4, > D1.c5 as c5 > from > (select T3671.ROW_WID as c3, > T3671.CAL_MONTH as c4, > T3671.CAL_YEAR as c5, > 1 as c6 > from > W_DAY_D T3671 > ) D1 > {code} > However, if I change the projection so that it refers to a column in an inner select block, D1.C6, whose value is itself a '1' literal, so it is functionally equivalent to the SQL above, Spark SQL will throw an error: > {code} > select Rank() OVER ( ORDER BY D1.c3 ) + D1.C6 as c1, > D1.c3 as c3, > D1.c4 as c4, > D1.c5 as c5 > from > (select T3671.ROW_WID as c3, > T3671.CAL_MONTH as c4, > T3671.CAL_YEAR as c5, > 1 as c6 > from > W_DAY_D T3671 > ) D1 > {code} > The error message is: > {code} > . . . . . . . . . . . . . . . .> java.lang.NullPointerException > Error: org.apache.spark.sql.AnalysisException: resolved attribute(s) c6#3386 missing from c5#3390 > ,c3#3383,c4#3389,_we0#3461,c3#3388 in operator !Project [c3#3388,c4#3389,c5#3390,c3#3383,_we0#346 > 1,(_we0#3461 + c6#3386) AS c1#3387]; (state=,code=0) > {code} > The above example is a simplified version of the SQL I was testing. The full SQL I was using, which fails with a similar error, is as follows: > {code} > select Case when case D1.c6 when 1 then D1.c3 else NULL end is not null then Rank() OVER ( ORDER BY case when ( case D1.c6 when 1 then D1.c3 else NULL end ) is null then 1 else 0 end, case D1.c6 when 1 then D1.c3 else NULL end ) end as c1, > Case when case D1.c7 when 1 then D1.c3 else NULL end is not null then Rank() OVER ( PARTITION BY D1.c4, D1.c5 ORDER BY case when ( case D1.c7 when 1 then D1.c3 else NULL end ) is null then 1 else 0 end, case D1.c7 when 1 then D1.c3 else NULL end ) end as c2, > D1.c3 as c3, > D1.c4 as c4, > D1.c5 as c5 > from > (select T3671.ROW_WID as c3, > T3671.CAL_MONTH as c4, > T3671.CAL_YEAR as c5, > ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC) as c6, > ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR, T3671.ROW_WID ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC, T3671.ROW_WID DESC) as c7 > from > W_DAY_D T3671 > ) D1 > {code} > Hopefully when fixed, both these sample SQLs should work! -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org