flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Bortoli <stefano.bort...@huawei.com>
Subject RE: GROUP BY TUMBLE on ROW range
Date Wed, 18 Oct 2017 13:26:02 GMT
Great, thanks for the explanation. I noticed now indeed that the examples are for the table
API. I believe over window is sufficient for the purpose right now, was just curious.


From: Fabian Hueske [mailto:fhueske@gmail.com]
Sent: Tuesday, October 17, 2017 9:24 PM
To: Stefano Bortoli <stefano.bortoli@huawei.com>
Cc: user@flink.apache.org
Subject: Re: GROUP BY TUMBLE on ROW range

Hi Stefano,
this is not supported in Flink's SQL and we would need new Group Window functions (like TUMBLE)
for this.
A TUMBLE_COUNT function would be somewhat similar to SESSION, which also requires checks on
the sorted neighboring rows to identify the window of a row.
Such a function would first need to be added to Calcite and then integrated with Flink.

A tumble count could also be expressed in plain SQL but wouldn't be very intuitive. You would
have to
- define an over window (maybe partitioned on some key) sorted on time with a ROW_NUMBER function
that assigns increasing numbers to rows.
- do a group by on the row number modulo the window size.
Btw. count windows are supported by the Table API.
Best, Fabian

2017-10-17 17:16 GMT+02:00 Stefano Bortoli <stefano.bortoli@huawei.com<mailto:stefano.bortoli@huawei.com>>:
Hi all,
Is there a way to use a tumble window group by with row range in streamSQL?
I mean, something like this:
//      "SELECT COUNT(*) " +
//             "FROM T1 " +

However, even looking at tests and looking at the “row interval expression generation”
I could not find any examples in SQL. I know it is supported by the stream APIs, and countWindow
is the chosen abstraction.

      .window(Tumble over 2.rows on 'long as 'w)

I fear I am missing something simple. Thanks a lot for the support guys!


View raw message