beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (BEAM-2535) Allow explicit output time independent of firing specification for all timers
Date Thu, 29 Jun 2017 20:58:00 GMT

     [ https://issues.apache.org/jira/browse/BEAM-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kenneth Knowles updated BEAM-2535:
----------------------------------
    Description: 
Today, we have insufficient control over the event time timestamp of elements output from
a timer callback.

1. For an event time timer, it is the timestamp of the timer itself.
2. For a processing time timer, it is the current input watermark at the time of processing.

But for both of these, we may want to reserve the right to output a particular time, aka set
a "watermark hold".

A naive implementation of a {{TimerWithWatermarkHold}} would work for making sure output is
not droppable, but does not fully explain window expiration and late data/timer dropping.

In the natural interpretation of a timer as a feedback loop on a transform, timers should
be viewed as another channel of input, with a watermark, and items on that channel _all need
event time timestamps even if they are delivered according to a different time domain_.

I propose that the specification for when a timer should fire should be separated (with nice
defaults) from the specification of the event time of resulting outputs. These timestamps
will determine a side channel with a new "timer watermark" that constrains the output watermark.

 - We still need to fire event time timers according to the input watermark, so that event
time timers fire.
 - Late data dropping and window expiration will be in terms of the minimum of the input watermark
and the timer watermark. In this way, whenever a timer is set, the window is not going to
be garbage collected.
 - We will need to make sure we have a way to "wake up" a window once it is expired; this
may be as simple as exhausting the timer channel as soon as the input watermark indicates
expiration of a window

This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It seems reasonable
to use timers as an implementation detail (e.g. in runners-core utilities) without wanting
any of this additional machinery. For example, if there is no possibility of output from the
timer callback.

  was:
Today, we have insufficient control over the event time timestamp of elements output from
a timer callback.

1. For an event time timer, it is the timestamp of the timer itself.
2. For a processing time timer, it is the current input watermark at the time of processing.

But for both of these, we may want to reserve the right to output a particular time, aka set
a "watermark hold".

A naive implementation of a {{TimerWithWatermarkHold}} would work for making sure output is
not droppable, but does not fully explain window expiration and late data/timer dropping.

In the natural interpretation of a timer as a feedback loop on a transform, timers should
be viewed as another channel of input, with a watermark, and items on that channel _all need
event time timestamps even if they are delivered according to a different time domain_.

I propose that the specification for when a timer should fire should be separated (with nice
defaults) from the specification of the event time of resulting outputs. These timestamps
will determine a side channel with a new "timer watermark" that constrains the output watermark.

 - We still need to fire event time timers according to the input watermark, so that event
time timers fire.
 - Late data dropping and window expiration will be in terms of the minimum of the input watermark
and the timer watermark. In this way, whenever a timer is set, the window is not going to
be garbage collected.
 - We will need to make sure we have a way to "wake up" a window once it is expired; this
may be as simple as exhausting the timer channel as soon as the input watermark indicates
expiration of a window


> Allow explicit output time independent of firing specification for all timers
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-2535
>                 URL: https://issues.apache.org/jira/browse/BEAM-2535
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model, sdk-java-core
>            Reporter: Kenneth Knowles
>            Assignee: Kenneth Knowles
>
> Today, we have insufficient control over the event time timestamp of elements output
from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
> 2. For a processing time timer, it is the current input watermark at the time of processing.
> But for both of these, we may want to reserve the right to output a particular time,
aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making sure output
is not droppable, but does not fully explain window expiration and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, timers should
be viewed as another channel of input, with a watermark, and items on that channel _all need
event time timestamps even if they are delivered according to a different time domain_.
> I propose that the specification for when a timer should fire should be separated (with
nice defaults) from the specification of the event time of resulting outputs. These timestamps
will determine a side channel with a new "timer watermark" that constrains the output watermark.
>  - We still need to fire event time timers according to the input watermark, so that
event time timers fire.
>  - Late data dropping and window expiration will be in terms of the minimum of the input
watermark and the timer watermark. In this way, whenever a timer is set, the window is not
going to be garbage collected.
>  - We will need to make sure we have a way to "wake up" a window once it is expired;
this may be as simple as exhausting the timer channel as soon as the input watermark indicates
expiration of a window
> This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It seems reasonable
to use timers as an implementation detail (e.g. in runners-core utilities) without wanting
any of this additional machinery. For example, if there is no possibility of output from the
timer callback.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message