beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (BEAM-101) Data-driven triggers
Date Thu, 01 Jun 2017 17:53:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033379#comment-16033379
] 

Kenneth Knowles edited comment on BEAM-101 at 6/1/17 5:52 PM:
--------------------------------------------------------------

I believe you could address the use case in a couple of ways:

1. A {{DoFn}} that uses state and timers to implement this behavior. You can do essentially
any custom triggering with this. The only issue is that your runner needs to support it.
2. The approach of a {{CombineFn}} does not work as described - you cannot apply it right
at the GBK because the element type may not match. You cannot apply it right at the {{Window.into}}
because the element may lead to many output elements and there's not really a good story around
propagating metadata in that case. You could have a {{CombineFn<Instant, AccumT, Boolean>}}
and it could work.

The other trouble is that including a {{CombineFn}} in a trigger is not as portable; it needs
a different execution strategy that calls a UDF, possibly over the Fn API. Today, triggers
are just syntax, so they can be executed easily and efficiently within a runner via any means.

The most coherent approach I know of for custom triggers (which is not really fleshed out)
is to do the combine on a PCollection explicitly and then have a trigger that just references
that PCollection. The runner then just needs to be able to decode a bool, not run a {{CombineFn}}.

In my mind, data-driven trigger means a trigger that is aware of the details of the data type.
A timestamp-driven trigger would not really be data-driven in this way. But until we have
some clear design for custom triggers, we could definitely consider adding new syntax to triggers
for particular common uses. If the existing solutions don't work for you, please open a JIRA
for your specific use case.


was (Author: kenn):
I believe you could address the use case in a couple of ways:

1. A {{DoFn}} that uses state and timers to implement this behavior. You can do essentially
any custom triggering with this. The only issue is that your runner needs to support it.
2. The approach of a {{CombineFn}} does not work as described - you cannot apply it right
at the GBK because the element type may not match. You cannot apply it right at the {{Window.into}}
because the element may lead to many output elements and there's not really a good story around
propagating metadata in that case. You could have a {{CombineFn<Instant, AccumT, Boolean>}}
and it could work.

The other trouble is that including a {{CombineFn}} in a trigger is not as portable; it needs
a different execution strategy that calls a UDF, possibly over the Fn API. Today, triggers
are just syntax, so they can be executed. The most coherent approach I know of (which is not
really fleshed out) is to do the combine on a PCollection explicitly and then have a trigger
that just references that PCollection. The runner then just needs to be able to decode a bool,
not run a {{CombineFn}}.

In my mind, data-driven trigger means a trigger that is aware of the details of the data type.
A timestamp-driven trigger would not really be data-driven in this way. But until we have
some clear design for custom triggers, we could definitely consider adding new syntax to triggers
for particular common uses. If the existing solutions don't work for you, please open a JIRA
for your specific use case.

> Data-driven triggers
> --------------------
>
>                 Key: BEAM-101
>                 URL: https://issues.apache.org/jira/browse/BEAM-101
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model
>            Reporter: Robert Bradshaw
>
> For some applications, it's useful to declare a pane/window to be emitted (or finished)
based on its contents. The simplest of these is the AfterCount trigger, but more sophisticated
predicates could be constructed.
> The requirements for consistent trigger firing are essentially that the state of the
trigger form a lattice and that the "should fire?" question is a monotonic predicate on the
lattice. Basically it asks "are we high enough up the lattice?"
> Because the element types may change between the application of Windowing and the actuation
of the trigger, one idea is to extract the relevant data from the element at Windowing and
pass it along implicitly where it can be combined and inspected in a type safe way later (similar
to how timestamps and windows are implicitly passed with elements).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message