reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Weimer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1223) IMRU Fault Tolerance - restart failed evaluators
Date Mon, 06 Jun 2016 15:45:21 GMT

    [ https://issues.apache.org/jira/browse/REEF-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316677#comment-15316677
] 

Markus Weimer commented on REEF-1223:
-------------------------------------

This issue links to a great many other issues. But I have trouble identifying which of those
are blocking and which are merely related. The current "contains" relationship doesn't seem
to capture it, as much of the work listed there is about bug fixes (which should be "Blocking")
or merely "Related". It would help to have a clearer picture on what is "Blocking" such that
we can help accordingly :)

[~juliaw], do you have a clearer picture in your mind? Can you change the link labels?

> IMRU Fault Tolerance - restart failed evaluators
> ------------------------------------------------
>
>                 Key: REEF-1223
>                 URL: https://issues.apache.org/jira/browse/REEF-1223
>             Project: REEF
>          Issue Type: New Feature
>          Components: IMRU, REEF.NET
>            Reporter: Julia
>            Assignee: Julia
>              Labels: FT
>         Attachments: REEF Fault Tolerant Technical design.docx
>
>
> Currently in .Net Group Communication and IMRU scenario, if one of the Evaluator failed
for whatever reason, all the Evaluators will be killed by the driver. 
> There are multiple levels of fault tolerant. The scenario we would like to support in
this JIRA is:
> *  When an evaluator failed, this failed evaluator will be killed and other good Evaluators
will stay, but all the tasks running on those Evaluators will be stopped. 
> *  A new Evaluator will be requested and started with the original task. 
> *  Same tasks will be resubmitted to the rest the Evaluators
> *  The topology of those tasks will be kept in the same group communication as before
> *  The data that have been downloaded in those good Evaluators will stay. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message