beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kenneth Knowles <...@google.com>
Subject Re: Windowing question
Date Thu, 25 Jan 2018 21:50:37 GMT
Ah, this is a common thought that comes up. The issue is that processing
time and event time in Beam are independent. Windows aren't processed one
after another - they are properties of whatever the data is describing, not
tied to how they are computed over. The simplest way to avoid tripping up
is to consider all windows to be processed simultaneously.

If you want to process things in some sorted order, such as event timestamp
order, you may want to look at the Sorter extension:
https://beam.apache.org/documentation/sdks/java-extensions/#sorter

Kenn

On Thu, Jan 25, 2018 at 5:37 AM, Tim Ross <Tim_Ross@ultimatesoftware.com>
wrote:

> The are many thousand inputs per window. I only need to only calculate
> certain statistics based on a new element, i.e. preset in current window
> but not previous window, and the last element, i.e. was in previous window
> but not the current window.
>
>
>
> Thanks,
>
> Tim
>
> *From: *Kenneth Knowles <klk@google.com>
> *Reply-To: *"user@beam.apache.org" <user@beam.apache.org>
> *Date: *Wednesday, January 24, 2018 at 3:13 PM
>
> *To: *"user@beam.apache.org" <user@beam.apache.org>
> *Subject: *Re: Windowing question
>
>
>
> Can you say more about how you treat the different sorts of inputs?
>
>
>
> On Wed, Jan 24, 2018 at 11:26 AM, Tim Ross <Tim_Ross@ultimatesoftware.com>
> wrote:
>
> Yes that is what the pipeline I came up with looks like. However the next
> step in the pipeline is new/expired logic. I have tried a variety of things
> but none have gotten me close to what I want. Hence my questioning to this
> mailing list.
>
>
>
>
>
> Thanks,
>
> Tim
>
> *From: *Kenneth Knowles <klk@google.com>
> *Reply-To: *"user@beam.apache.org" <user@beam.apache.org>
> *Date: *Wednesday, January 24, 2018 at 2:21 PM
>
>
> *To: *"user@beam.apache.org" <user@beam.apache.org>
> *Subject: *Re: Windowing question
>
>
>
> A first approach would be to just not translate any of the new/expired
> logic. Beam does have the concept of expiring a window, though it is true
> that only particular transformations actually drop expired data. Have you
> tried something along the lines of this?
>
>
>
>     pipeline.begin()
>
>         .apply(FooIO.read(...))
>
>         .apply(Window.into(SlidingWindows.of(Duration.
> standardMinutes(5)).every(Duration.standardSeconds(30))))
>
>         .apply(ParDo.of(new DoFn<>() { ... get into proper format ... })
>
>         ... the rest of the logic ...
>
>
>
> Just a very vague sketch of what, actually, most pipelines look like.
>
>
>
> Kenn
>
>
>
> On Wed, Jan 24, 2018 at 11:07 AM, Tim Ross <Tim_Ross@ultimatesoftware.com>
> wrote:
>
> I am trying to convert an existing Apache Storm Bolt into an Apache Beam
> pipeline. The storm bolt used sliding windows with a duration of 5 minute
> and a period of 30 seconds. After doing some initial transforms to get data
> in the proper format it would process all elements which were new or
> expired.
>
>
>
> Since Beam doesn’t have the concept of new and expired data in a window
> I’m trying to figure out how one would accomplish this.
>
>
>
>
>
> Thanks,
>
> Tim
>
> *From: *Kenneth Knowles <klk@google.com>
> *Reply-To: *"user@beam.apache.org" <user@beam.apache.org>
> *Date: *Wednesday, January 24, 2018 at 1:45 PM
>
>
> *To: *"user@beam.apache.org" <user@beam.apache.org>
> *Subject: *Re: Windowing question
>
>
>
> Generally, Beam will discard expired data for you (including state). Can
> you describe more? What is your windowing strategy? What are the edge
> triggers?
>
>
>
> On Wed, Jan 24, 2018 at 10:37 AM, Tim Ross <Tim_Ross@ultimatesoftware.com>
> wrote:
>
> I am just trying to do certain processing on edge triggers, i.e. new or
> expired data, to reduce the overall processing of a very large stream.
>
>
>
> How would I go about doing that with state? As I understand it, state is
> tied to key and window.
>
>
>
> Thanks,
>
> Tim
>
> *From: *Kenneth Knowles <klk@google.com>
> *Reply-To: *"user@beam.apache.org" <user@beam.apache.org>
> *Date: *Wednesday, January 24, 2018 at 1:25 PM
> *To: *"user@beam.apache.org" <user@beam.apache.org>
> *Subject: *Re: Windowing question
>
>
>
> A little clarification: in Beam an element exists in a single window,
> mathematically speaking. So when you use SlidingWindows, for example, to
> assign multiple windows this "copies" the value for each window, and that
> is how you should think of it, from a calculation point of view. Under the
> hood, a compressed representation is often used, but not in all situations.
>
>
>
> Kenn
>
>
>
> On Wed, Jan 24, 2018 at 9:45 AM, Robert Bradshaw <robertwb@google.com>
> wrote:
>
> No, Apache Beam doesn't offer this explicitly. You could accomplish it
> using State, but perhaps if you clarified what you were trying to
> accomplish by using these mechanisms there'd be another way to do the
> same thing.
>
>
> On Wed, Jan 24, 2018 at 7:03 AM, Tim Ross <Tim_Ross@ultimatesoftware.com>
> wrote:
> > Is there anything comparable to Apache Storm’s Window.getNew and
> > Window.getExpired in Apache Beam?  How would I determine if an element is
> > new or expired in consecutive windows?
> >
> >
> >
> > Thanks,
> >
> > Tim
> >
> > This e-mail message and any attachments to it are intended only for the
> > named recipients and may contain legally privileged and/or confidential
> > information. If you are not one of the intended recipients, do not
> duplicate
> > or forward this e-mail message.
>
>
>
> This e-mail message and any attachments to it are intended only for the
> named recipients and may contain legally privileged and/or confidential
> information. If you are not one of the intended recipients, do not
> duplicate or forward this e-mail message.
>
>
>
> This e-mail message and any attachments to it are intended only for the
> named recipients and may contain legally privileged and/or confidential
> information. If you are not one of the intended recipients, do not
> duplicate or forward this e-mail message.
>
>
>
> This e-mail message and any attachments to it are intended only for the
> named recipients and may contain legally privileged and/or confidential
> information. If you are not one of the intended recipients, do not
> duplicate or forward this e-mail message.
>
>
>
> This e-mail message and any attachments to it are intended only for the
> named recipients and may contain legally privileged and/or confidential
> information. If you are not one of the intended recipients, do not
> duplicate or forward this e-mail message.
>

Mime
View raw message