flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: [DISCUSS] Development of SQL OVER / Table API Row Windows for streaming tables
Date Mon, 23 Jan 2017 21:54:39 GMT
Hi Haohui,

our plan was in fact to piggy-back on Calcite and use the TUMBLE function
[1] once is it is available (CALCITE-1345 [2]).
Unfortunately, this issue does not seem to be very active, so I don't know
what the progress is.

I would suggest to move the discussion about group windows to a separate
thread and keep this one focused on the organization of the SQL OVER
windows.

Best,
Fabian

[1] http://calcite.apache.org/docs/stream.html)
[2] https://issues.apache.org/jira/browse/CALCITE-1345

2017-01-23 22:42 GMT+01:00 Haohui Mai <ricetons@gmail.com>:

> Hi Fabian,
>
> FLINK-4692 has added the support for tumbling window and we are excited to
> try it out and expose it as a SQL construct.
>
> Just curious -- what's your thought on the SQL syntax on tumbling window?
>
> Implementation wise it might make sense to think tumbling window as a
> special case of the sliding window.
>
> The problem I see is that the OVER construct might be insufficient to
> support all the use cases of tumbling windows. For example, it fails to
> express tumbling windows that have fractional time units (as pointed out in
> http://calcite.apache.org/docs/stream.html).
>
> It looks to me that the Calcite / Azure Stream Analytics have introduced a
> new construct (TUMBLE / TUMBLINGWINDOW) to address this issue.
>
> Do you think it is a good idea to follow the same conventions? Your ideas
> are appreciated.
>
> Regards,
> Haohui
>
>
> On Mon, Jan 23, 2017 at 1:02 PM Haohui Mai <ricetons@gmail.com> wrote:
>
> > +1
> >
> > We are also quite interested in these features and would love to
> > participate and contribute.
> >
> > ~Haohui
> >
> > On Mon, Jan 23, 2017 at 7:31 AM Fabian Hueske <fhueske@gmail.com> wrote:
> >
> >> Hi everybody,
> >>
> >> it seems that currently several contributors are working on new features
> >> for the streaming Table API / SQL around row windows (as defined in
> >> FLIP-11
> >> [1]) and SQL OVER-style window (FLINK-4678, FLINK-4679, FLINK-4680,
> >> FLINK-5584).
> >> Since these efforts overlap quite a bit I spent some time thinking about
> >> how we can approach these features and how to avoid overlapping
> >> contributions.
> >>
> >> The challenge here is the following. Some of the Table API row windows
> as
> >> defined by FLIP-11 [1] are basically SQL OVER windows while other cannot
> >> be
> >> easily expressed as such (TumbleRows for row-count intervals,
> >> SessionRows).
> >> However, since Calcite already supports SQL OVER windows, we can reuse
> the
> >> optimization logic for some of the Table API row windows. I also thought
> >> about the semantics of the TumbleRows and SessionRows windows as defined
> >> in
> >> FLIP-11 and came to the conclusion that these are not well defined in
> >> FLIP-11 and should rather be defined as SlideRows windows with a special
> >> PARTITION BY clause.
> >>
> >> I propose to approach SQL OVER windows and Table API row windows as
> >> follows:
> >>
> >> We start with three simple cases for SQL OVER windows (not Table API
> yet):
> >>
> >> * OVER RANGE for event time
> >> * OVER RANGE for processing time
> >> * OVER ROW for processing time
> >>
> >> All cases fulfill the following restrictions:
> >> - All aggregations in SELECT must refer to the same window.
> >> - PARTITION BY may not contain the rowtime attribute.
> >> - ORDER BY must be on rowtime attribute (for event time) or on a marker
> >> function that indicates processing time. Additional sort attributes are
> >> not
> >> supported initially.
> >> - only "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "BETWEEN x
> >> PRECEDING AND CURRENT ROW" are supported.
> >>
> >> OVER ROW for event time cannot be easily supported. With event time, we
> >> may
> >> have late records which need to be injected into the order of records.
> >> When
> >> a record in injected in to the order where a row-count window has
> already
> >> been computed, this and all following windows will change. We could
> either
> >> drop the record or sent out many retraction records. I think it is best
> to
> >> not open this can of worms at this point.
> >>
> >> The rational for all of the above restrictions is to have first versions
> >> of
> >> OVER windows soon.
> >> Once we have the above cases covered we can extend and remove
> limitations
> >> as follows:
> >>
> >> - Table API SlideRow windows (with the same restrictions as above). This
> >> will be mostly API work since the execution part has been solved before.
> >> - Add support for FOLLOWING (except UNBOUNDED FOLLOWING)
> >> - Add support for different windows in SELECT. All windows must be
> >> partitioned and ordered in the same way.
> >> - Add support for additional ORDER BY attributes (besides time).
> >>
> >> As I said before, TumbleRows and SessionRows windows as in FLIP-11 are
> not
> >> well defined, IMO.
> >> They can be expressed as SlideRows windows with special partitioning
> >> (partitioning on fixed, non-overlapping time ranges for TumbleRows, and
> >> gap-separated, non-overlapping time ranges for SessionRows)
> >> I would not start to work on those yet.
> >>
> >> I would like to close all related JIRA issues (FLINK-4678, FLINK-4679,
> >> FLINK-4680, FLINK-5584) and restructure the development of these
> features
> >> as outlined above with corresponding JIRA issues.
> >>
> >> What do others think? (I cc'ed the contributors assigned to the above
> JIRA
> >> issues)
> >>
> >> Best, Fabian
> >>
> >> [1]
> >>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 11%3A+Table+API+Stream+Aggregations
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message