beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1723) FlinkRunner should deduplicate when an UnboundedSource requires Deduping
Date Thu, 06 Apr 2017 17:37:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959390#comment-15959390
] 

Kenneth Knowles commented on BEAM-1723:
---------------------------------------

The caches do need to be fault-tolerant or you'll get dupes.

It is simplest to have no configuration, but hard to say. I think there could be some discussion
here. The deduplication window is really about the potential for re-delivery of a message,
not like allowed lateness at all.

For example, in {{PubsubIO}} duplicates occur when output is committed but {{finalizeCheckpoint}}
does not succeed at ACKing all messages. Then Pubsub will redeliver the message.

> FlinkRunner should deduplicate when an UnboundedSource requires Deduping
> ------------------------------------------------------------------------
>
>                 Key: BEAM-1723
>                 URL: https://issues.apache.org/jira/browse/BEAM-1723
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>            Reporter: Thomas Groh
>            Assignee: Jingsong Lee
>
> UnboundedSource implementations can require deduping, and the FlinkRunner currently logs
a warning that this is not supported.
> https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message