reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariia Mykhailova (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (REEF-1691) Should not request extra evaluators if evaluator failed at WatingForEvaluator state
Date Tue, 17 Jan 2017 19:33:27 GMT

     [ https://issues.apache.org/jira/browse/REEF-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mariia Mykhailova resolved REEF-1691.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 0.16

Resolved via [PR 1210|https://github.com/apache/reef/pull/1210]

> Should not request extra evaluators if evaluator failed at WatingForEvaluator state
> -----------------------------------------------------------------------------------
>
>                 Key: REEF-1691
>                 URL: https://issues.apache.org/jira/browse/REEF-1691
>             Project: REEF
>          Issue Type: Bug
>            Reporter: Julia
>            Assignee: Julia
>              Labels: FT
>             Fix For: 0.16
>
>
> When Evaluators fail at both WatingForEvalautor state and TaskRunningState, in recovery,
we use _failedEvaluatorsCount to request new Evaluators. That number includes the failed Evaluators
in both states, while we have requested the new Evaluators for failed Evaluators at WatingForEvalautor
state. This causes additional Evaluators are requested. It is a regression caused by REEF-1677.
> With REEF-1688, even we loose the condition to ignore the additional Evaluators added,
the additional allocated Evaluator can be received in other state because we change the system
state right after we got all the Evaluators needed. When we receive additional Allocated Evaluators
in other unexpected state, it will result in IMRUSystemException. 
> The fix is to only request Evaluators failed during/after task submitting in recovery.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message