flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lincoln.lee (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-6101) GroupBy fields with expression can not be selected
Date Mon, 20 Mar 2017 03:55:41 GMT

     [ https://issues.apache.org/jira/browse/FLINK-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

lincoln.lee updated FLINK-6101:
-------------------------------
    Description: 
currently the TableAPI do not support selecting GroupBy fields with expression either using
original field name or the expression 

{code}
 val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, 'd, 'e)
        .groupBy('e, 'b % 3)
        .select('b, 'c.min, 'e, 'a.avg, 'd.count)
{code}
caused
{code}
org.apache.flink.table.api.ValidationException: Cannot resolve [b] given input [e, ('b % 3),
TMP_0, TMP_1, TMP_2].
{code}
(BTW, this syntax is invalid in RDBMS which will indicate the selected column is invalid in
the select list because it is not contained in either an aggregate function or the GROUP BY
clause in SQL Server.)

and 
{code}
 val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, 'd, 'e)
        .groupBy('e, 'b % 3)
        .select('b%3, 'c.min, 'e, 'a.avg, 'd.count)
{code}
will also cause
{code}
org.apache.flink.table.api.ValidationException: Cannot resolve [b] given input [e, ('b % 3),
TMP_0, TMP_1, TMP_2].
{code}

and add an alias in groupBy clause "group(e, 'b%3 as 'b)" work without avail

the only way to get this work can be 
{code}
 val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, 'd, 'e)
        .select('a, 'b%3 as 'b, 'c, 'd, 'e)
        .groupBy('e, 'b)
        .select('b, 'c.min, 'e, 'a.avg, 'd.count)
{code}

One way to solve this is to add support alias in groupBy clause ( it seems a bit odd against
SQL though TableAPI has a different groupBy grammar),  
and I prefer to support select original expressions in groupBy clause(make consistent with
SQL).

as thus:
{code}
 val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, 'd, 'e)
        .groupBy('e, 'b % 3)
        .select('b % 3, 'c.min, 'e, 'a.avg, 'd.count)
{code}


After had a look into the code, found there was a problem in the groupBy implementation, validation
hadn't considered the expressions in groupBy clause. it should be noted that a table has been
actually changed after groupBy operation ( a new Table) and the groupBy keys replace the original
field reference in essence,  groupBy keys can only be selected or not, we can not do other
calculation on them.
 
What do you think?

  was:
currently the TableAPI do not support selecting GroupBy fields with expression either using
original field name or the expression 

{code}
 val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, 'd, 'e)
        .groupBy('e, 'b % 3)
        .select('b, 'c.min, 'e, 'a.avg, 'd.count)
{code}
caused
{code}
org.apache.flink.table.api.ValidationException: Cannot resolve [b] given input [e, ('b % 3),
TMP_0, TMP_1, TMP_2].
{code}
(BTW, this syntax is invalid in RDBMS which will indicate the selected column is invalid in
the select list because it is not contained in either an aggregate function or the GROUP BY
clause in SQL Server.)

and 
{code}
 val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, 'd, 'e)
        .groupBy('e, 'b % 3)
        .select('b%3, 'c.min, 'e, 'a.avg, 'd.count)
{code}
will also cause
{code}
org.apache.flink.table.api.ValidationException: Cannot resolve [b] given input [e, ('b % 3),
TMP_0, TMP_1, TMP_2].
{code}

and add an alias in groupBy clause "group(e, 'b%3 as 'b)" work without avail

the only way to get this work can be 
{code}
 val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, 'd, 'e)
        .select('a, 'b%3 as 'b, 'c, 'd, 'e)
        .groupBy('e, 'b)
        .select('b, 'c.min, 'e, 'a.avg, 'd.count)
{code}

One way to solve this is to add support alias in groupBy clause ( it seems a bit odd against
SQL though TableAPI has a different groupBy grammar),  
and I prefer to support select original expressions in groupBy clause(make consistent with
SQL).

as thus:
{code}
 val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, 'd, 'e)
        .groupBy('e, 'b % 3)
        .select('b % 3, 'c.min, 'e, 'a.avg, 'd.count)
{code}


After had a look into the code, found there was a problem in the groupBy implementation, validation
hadn't considered the expressions in groupBy clause. it should be noted that a table has been
actually changed after groupBy operation ( a new Table) and the groupBy keys replace the origin
field reference in essence,  groupBy keys can only be selected or not, we can not do other
calculation on them.
 
What do you think?


> GroupBy fields with expression can not be selected
> --------------------------------------------------
>
>                 Key: FLINK-6101
>                 URL: https://issues.apache.org/jira/browse/FLINK-6101
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>            Reporter: lincoln.lee
>
> currently the TableAPI do not support selecting GroupBy fields with expression either
using original field name or the expression 
> {code}
>  val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, 'd, 'e)
>         .groupBy('e, 'b % 3)
>         .select('b, 'c.min, 'e, 'a.avg, 'd.count)
> {code}
> caused
> {code}
> org.apache.flink.table.api.ValidationException: Cannot resolve [b] given input [e, ('b
% 3), TMP_0, TMP_1, TMP_2].
> {code}
> (BTW, this syntax is invalid in RDBMS which will indicate the selected column is invalid
in the select list because it is not contained in either an aggregate function or the GROUP
BY clause in SQL Server.)
> and 
> {code}
>  val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, 'd, 'e)
>         .groupBy('e, 'b % 3)
>         .select('b%3, 'c.min, 'e, 'a.avg, 'd.count)
> {code}
> will also cause
> {code}
> org.apache.flink.table.api.ValidationException: Cannot resolve [b] given input [e, ('b
% 3), TMP_0, TMP_1, TMP_2].
> {code}
> and add an alias in groupBy clause "group(e, 'b%3 as 'b)" work without avail
> the only way to get this work can be 
> {code}
>  val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, 'd, 'e)
>         .select('a, 'b%3 as 'b, 'c, 'd, 'e)
>         .groupBy('e, 'b)
>         .select('b, 'c.min, 'e, 'a.avg, 'd.count)
> {code}
> One way to solve this is to add support alias in groupBy clause ( it seems a bit odd
against SQL though TableAPI has a different groupBy grammar),  
> and I prefer to support select original expressions in groupBy clause(make consistent
with SQL).
> as thus:
> {code}
>  val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, 'd, 'e)
>         .groupBy('e, 'b % 3)
>         .select('b % 3, 'c.min, 'e, 'a.avg, 'd.count)
> {code}

> After had a look into the code, found there was a problem in the groupBy implementation,
validation hadn't considered the expressions in groupBy clause. it should be noted that a
table has been actually changed after groupBy operation ( a new Table) and the groupBy keys
replace the original field reference in essence,  groupBy keys can only be selected or not,
we can not do other calculation on them.
>  
> What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message