Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 8 Oct 2015 06:58:26 +0000 (UTC)
From: "Reynold Xin (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.12902822.1444166114000.54129.1444287506513@Atlassian.JIRA>
In-Reply-To: <JIRA.12902822.1444166114000@Atlassian.JIRA>
References: <JIRA.12902822.1444166114000@Atlassian.JIRA>
 <JIRA.12902822.1444166114707@arcas>
Subject: [jira] [Updated] (SPARK-10960) SQL with windowing function cannot
 reference column in inner select block
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/SPARK-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin updated SPARK-10960:
--------------------------------
    Description: 
There seems to be a bug in the Spark SQL parser when I use windowing functions. Specifically, when the SELECT refers to a column from an inner select block, the parser throws an error.

Here is an example:
--------------------------
When I use a windowing function and add a '1' constant to the result, 
{code}
   select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1
{code}

The Spark SQL parser works. The whole SQL is:
{code}
select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1,
                         D1.c3 as c3,
                         D1.c4 as c4,
                         D1.c5 as c5
                    from 
                         (select T3671.ROW_WID as c3,
                                   T3671.CAL_MONTH as c4,
                                   T3671.CAL_YEAR as c5,
                                   1 as c6
                              from 
                                   W_DAY_D T3671
                         ) D1
{code}

However, if I change the projection so that it refers to a column in an inner select block, D1.C6, whose value is itself a '1' literal, so it is functionally equivalent to the SQL above, Spark SQL will throw an error:
{code}
select Rank() OVER ( ORDER BY D1.c3 ) + D1.C6 as c1,
                         D1.c3 as c3,
                         D1.c4 as c4,
                         D1.c5 as c5
                    from 
                         (select T3671.ROW_WID as c3,
                                   T3671.CAL_MONTH as c4,
                                   T3671.CAL_YEAR as c5,
                                   1 as c6
                              from 
                                   W_DAY_D T3671
                         ) D1
{code}

The error message is:
{code}
. . . . . . . . . . . . . . . .> java.lang.NullPointerException
Error: org.apache.spark.sql.AnalysisException: resolved attribute(s) c6#3386 missing from c5#3390
,c3#3383,c4#3389,_we0#3461,c3#3388 in operator !Project [c3#3388,c4#3389,c5#3390,c3#3383,_we0#346
1,(_we0#3461 + c6#3386) AS c1#3387]; (state=,code=0)
{code}

The above example is a simplified version of the SQL I was testing. The full SQL I was using, which fails with a similar error, is as follows:

{code}
select Case when case D1.c6 when 1 then D1.c3 else NULL end  is not null then Rank() OVER ( ORDER BY case when ( case D1.c6 when 1 then D1.c3 else NULL end  ) is null then 1 else 0 end, case D1.c6 when 1 then D1.c3 else NULL end ) end as c1,
                         Case when case D1.c7 when 1 then D1.c3 else NULL end  is not null then Rank() OVER ( PARTITION BY D1.c4, D1.c5 ORDER BY case when ( case D1.c7 when 1 then D1.c3 else NULL end  ) is null then 1 else 0 end, case D1.c7 when 1 then D1.c3 else NULL end ) end as c2,
                         D1.c3 as c3,
                         D1.c4 as c4,
                         D1.c5 as c5
                    from 
                         (select T3671.ROW_WID as c3,
                                   T3671.CAL_MONTH as c4,
                                   T3671.CAL_YEAR as c5,
                                   ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC) as c6,
                                   ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR, T3671.ROW_WID ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC, T3671.ROW_WID DESC) as c7
                              from 
                                   W_DAY_D T3671
                         ) D1
{code}

Hopefully when fixed, both these sample SQLs should work!


  was:
There seems to be a bug in the Spark SQL parser when I use windowing functions. Specifically, when the SELECT refers to a column from an inner select block, the parser throws an error.

Here is an example:
--------------------------
When I use a windowing function and add a '1' constant to the result, 

   select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1

The Spark SQL parser works. The whole SQL is:

select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1,
                         D1.c3 as c3,
                         D1.c4 as c4,
                         D1.c5 as c5
                    from 
                         (select T3671.ROW_WID as c3,
                                   T3671.CAL_MONTH as c4,
                                   T3671.CAL_YEAR as c5,
                                   1 as c6
                              from 
                                   W_DAY_D T3671
                         ) D1
------
However, if I change the projection so that it refers to a column in an inner select block, D1.C6, whose value is itself a '1' literal, so it is functionally equivalent to the SQL above, Spark SQL will throw an error:

select Rank() OVER ( ORDER BY D1.c3 ) + D1.C6 as c1,
                         D1.c3 as c3,
                         D1.c4 as c4,
                         D1.c5 as c5
                    from 
                         (select T3671.ROW_WID as c3,
                                   T3671.CAL_MONTH as c4,
                                   T3671.CAL_YEAR as c5,
                                   1 as c6
                              from 
                                   W_DAY_D T3671
                         ) D1

The error message is:
. . . . . . . . . . . . . . . .> java.lang.NullPointerException
Error: org.apache.spark.sql.AnalysisException: resolved attribute(s) c6#3386 missing from c5#3390
,c3#3383,c4#3389,_we0#3461,c3#3388 in operator !Project [c3#3388,c4#3389,c5#3390,c3#3383,_we0#346
1,(_we0#3461 + c6#3386) AS c1#3387]; (state=,code=0)
----

The above example is a simplified version of the SQL I was testing. The full SQL I was using, which fails with a similar error, is as follows:

select Case when case D1.c6 when 1 then D1.c3 else NULL end  is not null then Rank() OVER ( ORDER BY case when ( case D1.c6 when 1 then D1.c3 else NULL end  ) is null then 1 else 0 end, case D1.c6 when 1 then D1.c3 else NULL end ) end as c1,
                         Case when case D1.c7 when 1 then D1.c3 else NULL end  is not null then Rank() OVER ( PARTITION BY D1.c4, D1.c5 ORDER BY case when ( case D1.c7 when 1 then D1.c3 else NULL end  ) is null then 1 else 0 end, case D1.c7 when 1 then D1.c3 else NULL end ) end as c2,
                         D1.c3 as c3,
                         D1.c4 as c4,
                         D1.c5 as c5
                    from 
                         (select T3671.ROW_WID as c3,
                                   T3671.CAL_MONTH as c4,
                                   T3671.CAL_YEAR as c5,
                                   ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC) as c6,
                                   ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR, T3671.ROW_WID ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC, T3671.ROW_WID DESC) as c7
                              from 
                                   W_DAY_D T3671
                         ) D1

-----
Hopefully when fixed, both these sample SQLs should work!


> SQL with windowing function cannot reference column in inner select block
> -------------------------------------------------------------------------
>
>                 Key: SPARK-10960
>                 URL: https://issues.apache.org/jira/browse/SPARK-10960
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.0, 1.5.0
>            Reporter: David Wong
>
> There seems to be a bug in the Spark SQL parser when I use windowing functions. Specifically, when the SELECT refers to a column from an inner select block, the parser throws an error.
> Here is an example:
> --------------------------
> When I use a windowing function and add a '1' constant to the result, 
> {code}
>    select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1
> {code}
> The Spark SQL parser works. The whole SQL is:
> {code}
> select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1,
>                          D1.c3 as c3,
>                          D1.c4 as c4,
>                          D1.c5 as c5
>                     from 
>                          (select T3671.ROW_WID as c3,
>                                    T3671.CAL_MONTH as c4,
>                                    T3671.CAL_YEAR as c5,
>                                    1 as c6
>                               from 
>                                    W_DAY_D T3671
>                          ) D1
> {code}
> However, if I change the projection so that it refers to a column in an inner select block, D1.C6, whose value is itself a '1' literal, so it is functionally equivalent to the SQL above, Spark SQL will throw an error:
> {code}
> select Rank() OVER ( ORDER BY D1.c3 ) + D1.C6 as c1,
>                          D1.c3 as c3,
>                          D1.c4 as c4,
>                          D1.c5 as c5
>                     from 
>                          (select T3671.ROW_WID as c3,
>                                    T3671.CAL_MONTH as c4,
>                                    T3671.CAL_YEAR as c5,
>                                    1 as c6
>                               from 
>                                    W_DAY_D T3671
>                          ) D1
> {code}
> The error message is:
> {code}
> . . . . . . . . . . . . . . . .> java.lang.NullPointerException
> Error: org.apache.spark.sql.AnalysisException: resolved attribute(s) c6#3386 missing from c5#3390
> ,c3#3383,c4#3389,_we0#3461,c3#3388 in operator !Project [c3#3388,c4#3389,c5#3390,c3#3383,_we0#346
> 1,(_we0#3461 + c6#3386) AS c1#3387]; (state=,code=0)
> {code}
> The above example is a simplified version of the SQL I was testing. The full SQL I was using, which fails with a similar error, is as follows:
> {code}
> select Case when case D1.c6 when 1 then D1.c3 else NULL end  is not null then Rank() OVER ( ORDER BY case when ( case D1.c6 when 1 then D1.c3 else NULL end  ) is null then 1 else 0 end, case D1.c6 when 1 then D1.c3 else NULL end ) end as c1,
>                          Case when case D1.c7 when 1 then D1.c3 else NULL end  is not null then Rank() OVER ( PARTITION BY D1.c4, D1.c5 ORDER BY case when ( case D1.c7 when 1 then D1.c3 else NULL end  ) is null then 1 else 0 end, case D1.c7 when 1 then D1.c3 else NULL end ) end as c2,
>                          D1.c3 as c3,
>                          D1.c4 as c4,
>                          D1.c5 as c5
>                     from 
>                          (select T3671.ROW_WID as c3,
>                                    T3671.CAL_MONTH as c4,
>                                    T3671.CAL_YEAR as c5,
>                                    ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC) as c6,
>                                    ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR, T3671.ROW_WID ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC, T3671.ROW_WID DESC) as c7
>                               from 
>                                    W_DAY_D T3671
>                          ) D1
> {code}
> Hopefully when fixed, both these sample SQLs should work!


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org