reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julia (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (REEF-1691) Should not request extra evaluators if evaluator failed at WatingForEvaluator state
Date Thu, 05 Jan 2017 01:13:58 GMT

     [ https://issues.apache.org/jira/browse/REEF-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Julia reassigned REEF-1691:
---------------------------

    Assignee: Julia

> Should not request extra evaluators if evaluator failed at WatingForEvaluator state
> -----------------------------------------------------------------------------------
>
>                 Key: REEF-1691
>                 URL: https://issues.apache.org/jira/browse/REEF-1691
>             Project: REEF
>          Issue Type: Bug
>            Reporter: Julia
>            Assignee: Julia
>              Labels: FT
>
> When Evaluators fail at both WatingForEvalautor state and TaskRunningState, in recovery,
we use _failedEvaluatorsCount to request new Evaluators. That number includes the failed Evaluators
in both states, while we have requested the new Evaluators for failed Evaluators at WatingForEvalautor
state. This causes additional Evaluators are requested. It is a regression caused by REEF1677.
> With REEF-1688, even we loose the condition to ignore the additional Evaluators added,
the additional allocated Evaluator can be received in other state because we change the system
state right after we got all the Evaluators needed. When we receive additional Allocated Evaluators
in other unexpected state, it will result in IMRUSystemException. 
> The fix is to only request Evaluators failed during/after task submitting in recovery.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message