beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré (JIRA) <j...@apache.org>
Subject [jira] [Commented] (BEAM-3186) In-flight data loss when restoring from savepoint
Date Mon, 05 Feb 2018 12:35:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352322#comment-16352322
] 

Jean-Baptiste Onofré commented on BEAM-3186:
--------------------------------------------

[~aljoscha] thanks, I cherry-pick on release-2.3.0 branch.

> In-flight data loss when restoring from savepoint
> -------------------------------------------------
>
>                 Key: BEAM-3186
>                 URL: https://issues.apache.org/jira/browse/BEAM-3186
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Pawel Bartoszek
>            Assignee: Dawid Wysakowicz
>            Priority: Blocker
>             Fix For: 2.3.0
>
>         Attachments: restore_no_trigger.png, restore_with_trigger.png, restore_with_trigger_b.png
>
>
> *The context:*
> I want to count how many events of given type(A,B, etc) I receive every minute using
1 minute windows and AfterWatermark trigger with allowed lateness 1 min.
> *Data loss case*
> In the case below if there is at least one A element with the event time belonging to
the window 14:00-14:01 read from Kinesis stream after job is restored from savepoint the data
loss will not be observed for this key and this window.
> !restore_no_trigger.png!
> *Not data loss case*
> However, if no new A element element is read from Kinesis stream than data loss is observable.
> !restore_with_trigger.png!
> *Workaround*
> As a workaround we could configure early firings every X seconds which gives up to X
seconds data loss per key on restore.
> *My guess where the issue might be*
> I believe this is Beam-Flink integration layer bug. From my investigation I don't think
it's KinesisReader and possibility that it couldn't advance watermark. To prove that after
I restore from savepoint I sent some records for different key (B) for the same window as
shown in the pictures(14:00-14:01) without seeing trigger going off for restored window and
key A.
> My guess is that Beam after job is restored doesn't register flink event time timer for
restored window unless there is a new element (key) coming for the restored window.
> Please refer to [this gist|https://gist.github.com/pbartoszek/7ab88c8b6538039db1b383358d1d1b5a]
for test job that shows this behaviour.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message