flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryantaocer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10644) Batch Job: Speculative execution
Date Fri, 09 Nov 2018 03:50:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680808#comment-16680808

ryantaocer commented on FLINK-10644:

An initial design doc:


> Batch Job: Speculative execution
> --------------------------------
>                 Key: FLINK-10644
>                 URL: https://issues.apache.org/jira/browse/FLINK-10644
>             Project: Flink
>          Issue Type: New Feature
>          Components: JobManager
>            Reporter: JIN SUN
>            Assignee: JIN SUN
>            Priority: Major
>             Fix For: 1.8.0
> Strugglers/outlier are tasks that run slower than most of the all tasks in a Batch Job,
this somehow impact job latency, as pretty much this straggler will be in the critical path
of the job and become as the bottleneck.
> Tasks may be slow for various reasons, including hardware degradation, or software mis-configuration,
or noise neighboring. It's hard for JM to predict the runtime.
> To reduce the overhead of strugglers, other system such as Hadoop/Tez, Spark has *_speculative
execution_*. Speculative execution is a health-check procedure that checks for tasks to be
speculated, i.e. running slower in a ExecutionJobVertex than the median of all successfully
completed tasks in that EJV, Such slow tasks will be re-submitted to another TM. It will not
stop the slow tasks, but run a new copy in parallel. And will kill the others if one of them
> This JIRA is an umbrella to apply this kind of idea in FLINK. Details will be append

This message was sent by Atlassian JIRA

View raw message