flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shaoxuan Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5386) Refactoring Window Clause
Date Tue, 17 Jan 2017 02:12:27 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824844#comment-15824844

Shaoxuan Wang commented on FLINK-5386:

[~sunjincheng121], thanks for the updates.

Hi [~fhueske],
The major reason we propose this change is because of the row window. For row window, there
could be no groupby keys. As the current proposal in FLIP11, the tableAPI is as follows:
  .window(RowWindow as ‘x)
  .select(‘b.count over ‘x as ‘xcnt, ‘x.start, ‘x.end)
If we want to partition the data and trigger the result using window function, we have to
translate the .window operator to a kind of grouping by query plan, which is a little weird.
With this proposal, groupby operator will be able to not only groupby keys, but also window
clause. I think this is the correct semantic. The above example will be written in this way:

  .window(RowWindow as ‘x)
  .select(‘b.count over ‘x as ‘xcnt, ‘x.start, ‘x.end)
What do you think?

This changes give more flexibility to users such that they can still put window clause and
groupby close (just move the window definition before groupby) if they want.
I think your have raised a good question on "scope of window" for batch window on a certain
column (which could be removed by some operators). We should make sure this will still work.
We will check the design and add the test cases for this.

> Refactoring Window Clause
> -------------------------
>                 Key: FLINK-5386
>                 URL: https://issues.apache.org/jira/browse/FLINK-5386
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>            Reporter: sunjincheng
>            Assignee: sunjincheng
> Similar to the SQL, window clause is defined "as" a symbol which is explicitly used in
groupby/over. We are proposing to refactor the way to write groupby+window tableAPI as follows:

> {code}
> val windowedTable = table
>  .window(Slide over 10.milli every 5.milli as 'w1)
>  .window(Tumble over 5.milli  as 'w2)
>  .groupBy('w1, 'key)
>  .select('string, 'int.count as 'count, 'w1.start)
>  .groupBy( 'w2, 'key)
>  .select('string, 'count.sum as sum2)
>  .window(Tumble over 5.milli  as 'w3)
>  .groupBy( 'w3) // windowAll
>  .select('sum2, 'w3.start, 'w3.end)
> {code}
> In this way, we can remove both GroupWindowedTable and the window() method in GroupedTable
which makes the API a bit clean. In addition, for row-window, we anyway need to define window
clause as a symbol. This change will make the API of window and row-window consistent, example
for row-window:
> {code}
>   .window(RowXXXWindow as ‘x, RowYYYWindow as ‘y)
>   .select(‘a, ‘b.count over ‘x as ‘xcnt, ‘c.count over ‘y as ‘ycnt, ‘x.start,
> {code}
> What do you think? [~fhueske] [~twalthr]

This message was sent by Atlassian JIRA

View raw message