apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <tho...@datatorrent.com>
Subject Re: Windowed Operators
Date Fri, 13 May 2016 16:05:26 GMT
Hi Siyuan,

This looks good. It would be good to discuss proposed interfaces at a high
level and also how they will converge with existing aggregations
(dimensions computation) and associated state management. The window state
checkpointing ties into managed state and spillable data structures work as
well as the work for the join operator that is in progress, so it will be
good to engage relevant folks in this discussion.

Thanks,
Thomas


On Wed, May 11, 2016 at 11:47 AM, Siyuan Hua <siyuan@datatorrent.com> wrote:

> Hi guys,
>
> I just create a ticket for windowed operators. It is the prerequisite for
> High-level API and Beam integration.
>
> As per our recent several discussions in the community. A group of Windowed
> Operators that delivers the window semantic follows the google Data Flow
> model(https://cloud.google.com/dataflow/) is very important.
> The operators should be designed and implemented in a way for
> High-level API
> Beam translation
> Easy to use with other popular operator
>
> Hierarchy of the operators,
> The windowed operators should cover all possible transformations that
> require window, and batch processing is also considered as special window
> called global window
>
>                    +-------------------+
>        +---------> |  WindowedOperator | <--------+
>        |           +--------+----------+          |
>        |                    ^      ^--------------------------------+
>        |                    |                     |                 |
>        |                    |                     |                 |
> +------+--------+    +------+------+      +-------+-----+    +------+-----+
> |CombineOperator|    |GroupOperator|      |KeyedOperator|    |JoinOperator|
> +---------------+    +-------------+      +------+------+    +-----+------+
>                                    +---------^   ^                 ^
>                                    |             |                 |
>                           +--------+---+   +-----+----+       +----+----+
>                           |KeyedCombine|   |KeyedGroup|       | CoGroup |
>                           +------------+   +----------+       +---------+
>
>
> Combine operation includes all operations that combine all tuples in one
> window into one or small number of tuples, Group operation group all tuples
> in one window, Join and CoGroup are used to join and group tuples from
> different inputs.
>
> Components:
> Window Component, it includes configuration, window state that should be
> checkpointed, etc. It should support NonMergibleWindow(fixed or slide)
> MergibleWindow(Session)
>
>  Trigger:
> It should support early trigger, late trigger. It should support
> customizable trigger behaviour
>
> Outside components:
> Watermark generator, can be plugged into input source to generate watermark
> Tuple util components, The operator should be only working on Tuples with
> schema. It can handle either predefined tuple type or give a declarative
> API to describe the user defined tuple class
>
> Most component API should be reused in High-Level API
>
> This is the umbrella ticket, separate tickets would be created for
> different components and operators respectively
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message