flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-5017) Introduce WatermarkStatus stream element to allow for temporarily idle streaming sources
Date Fri, 04 Nov 2016 12:28:58 GMT
Tzu-Li (Gordon) Tai created FLINK-5017:

             Summary: Introduce WatermarkStatus stream element to allow for temporarily idle
streaming sources
                 Key: FLINK-5017
                 URL: https://issues.apache.org/jira/browse/FLINK-5017
             Project: Flink
          Issue Type: New Feature
          Components: Streaming
            Reporter: Tzu-Li (Gordon) Tai
            Assignee: Tzu-Li (Gordon) Tai
            Priority: Blocker
             Fix For: 1.2.0

A {{WatermarkStatus}} element informs receiving operators whether or not they should continue
to expect watermarks from the sending operator. There are 2 kinds of status, namely {{IDLE}}
and {{ACTIVE}}. Watermark status elements are generated at the sources, and may be propagated
through the operators of the topology using {{Output#emitWatermarkStatus(WatermarkStatus)}}.
Sources and downstream operators should emit either of the status elements once it changes
between "watermark-idle" and "watermark-active" states.

A source is considered "watermark-idle" if it will not emit records for an indefinite amount
of time. This is the case, for example, for Flink's Kafka Consumer, where sources might initially
have no assigned partitions to read from, or no records can be read from the assigned partitions.
Once the source detects that it will resume emitting data, it is considered "watermark-active".

Downstream operators with multiple inputs (ex. head operators of a {{OneInputStreamTask}}
or {{TwoInputStreamTask}}) should not wait for watermarks from an upstream operator that is
"watermark-idle" when deciding whether or not to advance the operator's current watermark.
When a downstream operator determines that all upstream operators are "watermark-idle" (i.e.
when all input channels have received the watermark idle status element), then the operator
is considered to also be "watermark-idle", as it will temporarily be unable to advance its
own watermark. This is always the case for operators that only read from a single upstream
operator. Once an operator is considered "watermark-idle", it should itself forward its idle
status to inform downstream operators. The operator is considered to be back to "watermark-active"
as soon as at least one of its upstream operators resume to be "watermark-active" (i.e. when
at least one input channel receives the watermark active status element), and should also
forward its active status to inform downstream operators.

This message was sent by Atlassian JIRA

View raw message