flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Neumann <mneum...@sics.se>
Subject Re: Event time in Flink streaming
Date Fri, 28 Aug 2015 15:35:09 GMT
The stream consists of logs from different machines with synchronized
clocks. As a result timestamps are not strictly increasing but there is a
bound on how much out of order they can be. (One aim is to detect events go
out of order more then a certain amount indication some problem in the
system setup)

I will look at the example policies and see if I can find a way to make it
work with 0.9.

I am aware of Google Dataflow and the discussion on Flink, though I just
recently learned more about the field, so I didn't have to much useful to
say. This might change if I get some more experience with the usecase I'm
working on.

cheers Martin

On Fri, Aug 28, 2015 at 5:06 PM, Aljoscha Krettek <aljoscha@apache.org>
wrote:

> Hi Martin,
> the answer depends, because the current windowing implementation has some
> problems. We are working on improving it in the 0.10 release, though.
>
> If your elements arrive with strictly increasing timestamps and you have
> parallelism=1 or don't perform any re-partitioning of data (which a
> groupBy() does, for example) then what Matthias proposed works for you. If
> not then you can get intro problems with out-of-order elements and windows
> will be incorrectly determined.
>
> If you are interested in what we are working on for 0.10, please look at
> the design documents here
> https://cwiki.apache.org/confluence/display/FLINK/Streams+and+Operations+on+Streams and
> here
> https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams.
> The basic idea is to make windows work correctly when elements arrive not
> ordered by timestamps. For this we want use watermarks as popularized, for
> example, by Google Dataflow.
>
> Please ask if you have questions about this or are interested in joining
> the discussion (the design as not yet finalized, both API and
> implementation). :D
>
> Cheers,
> Aljoscha
>
> P.S. I have some proof-of-concept work in a branch of mine, if you
> interested in my work there I could give you access to it.
>
> On Fri, 28 Aug 2015 at 16:11 Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
>
>> Hi Martin,
>>
>> you need to implement you own policy. However, this should be be
>> complicated. Have a look at "TimeTriggerPolicy". You just need to
>> provide a "Timestamp" implementation that extracts you ts-attribute from
>> the tuples.
>>
>> -Matthias
>>
>> On 08/28/2015 03:58 PM, Martin Neumann wrote:
>> > Hej,
>> >
>> > I have a stream of timestamped events I want to process in Flink
>> streaming.
>> > Di I have to write my own policies to do so, or can define time based
>> > windows to use the timestamps instead of the system time?
>> >
>> > cheers Martin
>>
>>

Mime
View raw message