spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Li (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-24424) Support ANSI-SQL compliant syntax for ROLLUP, CUBE and GROUPING SET
Date Wed, 30 May 2018 06:38:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-24424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiao Li updated SPARK-24424:
----------------------------
    Description: 
Currently, our Group By clause follows Hive [https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup]
:
 However, this does not match ANSI SQL compliance. The proposal is to update our parser and
analyzer for ANSI compliance. 
 For example,
{code:java}
GROUP BY col1, col2 WITH ROLLUP

GROUP BY col1, col2 WITH CUBE

GROUP BY col1, col2 GROUPING SET ...
{code}
It is nice to support ANSI SQL syntax at the same time.
{code:java}
GROUP BY ROLLUP(col1, col2)

GROUP BY CUBE(col1, col2)

GROUP BY GROUPING SET(...) 
{code}
Note, we only need to support one-level grouping set in this stage. That means, nested grouping
set is not supported.

The parser changes should be like
{code:SQL}

group-by-expressions

>>-GROUP BY----+-hive-sql-group-by-expressions-----+---><
               '-ansi-sql-grouping-set-expressions-'    

hive-sql-group-by-expressions

                        '--GROUPING SETS--(--grouping-set-expressions--)--'
   .-,--------------.   +--WITH CUBE--------------------------------------+
   V                |   +--WITH ROLLUP------------------------------------+
>>---+-expression-+-+---+-------------------------------------------------+-><

grouping-expressions-list

   .-,--------------.  
   V                |  
>>---+-expression-+-+--><


grouping-set-expressions

    .-,----------------------------.
    |      .-,--------------.      |
    |      V                |      |
    V '-(------expression---+-)-'  |
>>----+-expression--------------+--+-><


ansi-sql-grouping-set-expressions

>>-+-ROLLUP--(--grouping-expression-list--)---------+--><
   +-CUBE--(--grouping-expression-list--)-----------+   
   '-GROUPING SETS--(--grouping-set-expressions--)--'  
{code}
 

  was:
Currently, our Group By clause follows Hive [https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup]
:
 However, this does not match ANSI SQL compliance. The proposal is to update our parser and
analyzer for ANSI compliance. 
 For example,
{code:java}
GROUP BY col1, col2 WITH ROLLUP

GROUP BY col1, col2 WITH CUBE

GROUP BY col1, col2 GROUPING SET ...
{code}
It is nice to support ANSI SQL syntax at the same time.
{code:java}
GROUP BY ROLLUP(col1, col2)

GROUP BY CUBE(col1, col2)

GROUP BY GROUPING SET(...) 
{code}
Note, we only need to support one-level grouping set in this stage. That means, nested grouping
set is not supported.

The parser changes should be like
{code}
group-by-expressions

>>-GROUP BY----+-hive-sql-group-by-expressions-----+---><
               '-ansi-sql-grouping-set-expressions-'    

hive-sql-group-by-expressions

                        '--GROUPING SETS--(--grouping-set-expressions--)--'
   .-,--------------.   +--WITH CUBE--------------------------------------+
   V                |   +--WITH ROLLUP------------------------------------+
>>---+-expression-+-+---+-------------------------------------------------+-><

grouping-expressions-list

   .-,--------------.  
   V                |  
>>---+-expression-+-+--><


grouping-set-expressions

    .-,----------------------------.
    |      .-,--------------.      |
    |      V                |      |
    V '-(------expression---+-)-'  |
>>----+-expression--------------+--+-><


ansi-sql-grouping-set-expressions

>>-+-ROLLUP--(--grouping-expression-list--)---------+--><
   +-CUBE--(--grouping-expression-list--)-----------+   
   '-GROUPING SETS--(--grouping-set-expressions--)--'  
{code}
 


> Support ANSI-SQL compliant syntax for ROLLUP, CUBE and GROUPING SET
> -------------------------------------------------------------------
>
>                 Key: SPARK-24424
>                 URL: https://issues.apache.org/jira/browse/SPARK-24424
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Xiao Li
>            Priority: Major
>
> Currently, our Group By clause follows Hive [https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup]
:
>  However, this does not match ANSI SQL compliance. The proposal is to update our parser
and analyzer for ANSI compliance. 
>  For example,
> {code:java}
> GROUP BY col1, col2 WITH ROLLUP
> GROUP BY col1, col2 WITH CUBE
> GROUP BY col1, col2 GROUPING SET ...
> {code}
> It is nice to support ANSI SQL syntax at the same time.
> {code:java}
> GROUP BY ROLLUP(col1, col2)
> GROUP BY CUBE(col1, col2)
> GROUP BY GROUPING SET(...) 
> {code}
> Note, we only need to support one-level grouping set in this stage. That means, nested
grouping set is not supported.
> The parser changes should be like
> {code:SQL}
> group-by-expressions
> >>-GROUP BY----+-hive-sql-group-by-expressions-----+---><
>                '-ansi-sql-grouping-set-expressions-'    
> hive-sql-group-by-expressions
>                         '--GROUPING SETS--(--grouping-set-expressions--)--'
>    .-,--------------.   +--WITH CUBE--------------------------------------+
>    V                |   +--WITH ROLLUP------------------------------------+
> >>---+-expression-+-+---+-------------------------------------------------+-><
> grouping-expressions-list
>    .-,--------------.  
>    V                |  
> >>---+-expression-+-+--><
> grouping-set-expressions
>     .-,----------------------------.
>     |      .-,--------------.      |
>     |      V                |      |
>     V '-(------expression---+-)-'  |
> >>----+-expression--------------+--+-><
> ansi-sql-grouping-set-expressions
> >>-+-ROLLUP--(--grouping-expression-list--)---------+--><
>    +-CUBE--(--grouping-expression-list--)-----------+   
>    '-GROUPING SETS--(--grouping-set-expressions--)--'  
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message