flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr>
Subject Re: Batch Processing Fault Tolerance (DataSet API)
Date Mon, 22 Feb 2016 17:33:19 GMT
Thank you, Till!

The current (in progress) implementation is considering also the problem related to losing
the task's slots of the failed node(s), something related to [2] ?

[2] https://issues.apache.org/jira/browse/FLINK-3047

Best,
Ovidiu

> On 22 Feb 2016, at 18:13, Till Rohrmann <trohrmann@apache.org> wrote:
> 
> Hi Ovidiu,
> 
> at the moment Flink's batch fault tolerance restarts the whole job in case of a failure.
However, parts of the logic to do partial backtracking such as intermediate result partitions
and the backtracking algorithm are already implemented or exist as a PR [1]. So we hope to
complete the partial backtracking soon.
> 
> [1] https://github.com/apache/flink/pull/640 <https://github.com/apache/flink/pull/640>
> 
> Cheers,
> Till
> 
> On Mon, Feb 22, 2016 at 6:00 PM, Ovidiu-Cristian MARCU <ovidiu-cristian.marcu@inria.fr
<mailto:ovidiu-cristian.marcu@inria.fr>> wrote:
> Hi
> 
> In case of failure of a node what does it mean 'Fault tolerance for programs in the DataSet
API works by retrying failed executions’ [1] ?
> -work already done by the rest of the nodes is not lost, only work of the lost node is
recomputed, job execution will continue
> or
> -entire job execution is retried
> 
> [1] https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/fault_tolerance.html
<https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/fault_tolerance.html>
> 
> Best,
> Ovidiu 
> 


Mime
View raw message