flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elias Levy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6472) BoundedOutOfOrdernessTimestampExtractor does not bound out of orderliness
Date Tue, 09 May 2017 16:18:05 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002961#comment-16002961

Elias Levy commented on FLINK-6472:

Ideally Flink would make available an abstract class similar to {{BoundedOutOfOrdernessTimestampExtractor}}
that would bound the watermark in event time rather than processing time, thus ensuring even
watermark output in event time, but that would also take a parameter that would bound in processing
time how long it will wait before generating a watermark, ensuring a watermark is output if
there is a lull in messages to trigger the event time driven watermark.

Alternatively, if the extractor could implement both {{AssignerWithPeriodicWatermarks}} and
{{AssignerWithPunctuatedWatermarks}} then the user could implement that logic himself.

> BoundedOutOfOrdernessTimestampExtractor does not bound out of orderliness
> -------------------------------------------------------------------------
>                 Key: FLINK-6472
>                 URL: https://issues.apache.org/jira/browse/FLINK-6472
>             Project: Flink
>          Issue Type: Bug
>          Components: DataStream API
>    Affects Versions: 1.3.0
>            Reporter: Elias Levy
> {{BoundedOutOfOrdernessTimestampExtractor}} attempts to emit watermarks that lag behind
the largest observed timestamp by a configurable time delta.  It fails to so in some circumstances.
> The class extends {{AssignerWithPeriodicWatermarks}}, which generates watermarks in periodic
intervals.  The timer for this intervals is a processing time timer.
> In circumstances where there is a rush of events (restarting Flink, unpausing an upstream
producer, loading events from a file, etc), many events with timestamps much larger that what
the configured bound would normally allow will be sent downstream without a watermark.  This
can have negative effects downstream, as operators may be buffering the events waiting for
a watermark to process them, thus leading the memory growth and possible out-of-memory conditions.
> It is probably best to have a bounded out of orderliness extractor that is based on the
punctuated timestamp extractor, so we can ensure that watermarks are generated in a timely
fashion in event time, with the addition of process time timer to generate a watermark if
there has been a lull in events, thus also bounding the delay of generating a watermark in
processing time. 

This message was sent by Atlassian JIRA

View raw message