flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sunjincheng (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-6428) Add support DISTINCT in dataStream SQL
Date Wed, 03 May 2017 02:23:04 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994172#comment-15994172
] 

sunjincheng edited comment on FLINK-6428 at 5/3/17 2:22 AM:
------------------------------------------------------------

Hi [~rtudoran], Thanks for pay attention to this JIRA.

In standard database there are two situations can using `DISTINCT` keyword. 
*  in `SELECT Clause`, e.g.: `SELECT DISTINCT name FROM table` 
*  in `AGG Clause`, e.g.: `COUNT([ALL|DISTINCT] expression)`,`SUM([ALL|DISTINCT] expression)`,
etc. 

First up, [FLINK-6249 | https://issues.apache.org/jira/browse/FLINK-6249] talk about  `AGG
Clause`. And in this JIRA. talk about `SELECT Clause`.

Next up, we talk about growing elements, the limitations tend to be back-end storage(flink
state). In theory, external storage is infinitely large (user can control and expect), this
point of view, the infinite STREAM of the DISTINCT can be supported.In addition, external
storage, for example: RocksDB, the user can set the TTL according to the actual amount of
business data to ensure that external storage is working properly.

So, IMO. we can support `DISTINCT` feature in `SELECT Clause`, And reminds the user to pay
attention to the control of external storage. What do you think?

Thanks,
SunJincheng


was (Author: sunjincheng121):
Hi [~rtudoran], Thanks for pay attention to this JIRA.

In standard database there are two situations can using `DISTINCT` keyword. 
*  in `SELECT Clause`, e.g.: `SELECT DISTINCT name FROM table` 
*  in `AGG Clause`, e.g.: `COUNT([ALL|DISTINCT] expression)`,`SUM([ALL|DISTINCT] expression)`,
etc. 

First up, [FLINK-6249 | https://issues.apache.org/jira/browse/FLINK-6249] talk about  `AGG
Clause`. And in this JIRA. talk about `SELECT Clause`.

Next up, we talk about growing elements, the limitations tend to be back-end storage(flink
state). In theory, external storage is infinitely large (user can control and expect), this
point of view, the infinite STREAM of the DISTINCT can be supported.In addition, external
storage, for example: RocksDB, the user can set the TTL according to the actual amount of
business data to ensure that external storage is working properly.

So, IMO. we can support `DISTINCT` feature, And reminds the user to pay attention to the control
of external storage. What do you think?

Thanks,
SunJincheng

> Add support DISTINCT in dataStream SQL
> --------------------------------------
>
>                 Key: FLINK-6428
>                 URL: https://issues.apache.org/jira/browse/FLINK-6428
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: sunjincheng
>            Assignee: sunjincheng
>
> Add support DISTINCT in dataStream SQL as follow:
> DATA:
> {code}
> (name, age)
> (kevin, 28),
> (sunny, 6),
> (jack, 6)
> {code}
> SQL:
> {code}
> SELECT DISTINCT age FROM MyTable"
> {code}
> RESULTS:
> {code}
> 28, 6
> {code}
> [~fhueske] do we need this feature?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message