beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Sela (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1582) ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.
Date Wed, 01 Mar 2017 22:24:45 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891188#comment-15891188
] 

Amit Sela commented on BEAM-1582:
---------------------------------

Looks like the flake happens when the entire input is re-read.
We inject 4 elements to Kafka before the first run, and 2 more before the second. When all
is well, printing the number of elements read by SparkUnboundedSource's {{readUnboundedStream}}
JavaDStream says 4 (sometimes 1 followed by 3) in the first run, and 2 in the second, but
in failures, it reads 6 in the second.
This would happen if the checkpoint of the readers are not persisted for some reason causing
the KafkaIO to use the default "earliest" and so read everything.
This happens even though checkpoint interval is batch interval. I will check if there's a
way to guarantee/block on checkpointing.

> ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.
> ------------------------------------------------------------------------------
>
>                 Key: BEAM-1582
>                 URL: https://issues.apache.org/jira/browse/BEAM-1582
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Amit Sela
>            Assignee: Amit Sela
>
> See: https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-spark/2788/testReport/junit/org.apache.beam.runners.spark.translation.streaming/ResumeFromCheckpointStreamingTest/testWithResume/
> After some digging in it appears that a second firing occurs (though only one is expected)
but it doesn't come from a stale state (state is empty before it fires).
> Might be a retry happening for some reason, which is OK in terms of fault-tolerance guarantees
(at-least-once), but not so much in terms of flaky tests. 
> I'm looking into this hoping to fix this ASAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message