reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julia (JIRA)" <j...@apache.org>
Subject [jira] [Created] (REEF-1248) Identify the scenarios that need to restart evaluators
Date Fri, 11 Mar 2016 21:01:16 GMT
Julia created REEF-1248:
---------------------------

             Summary: Identify the scenarios that need to restart evaluators
                 Key: REEF-1248
                 URL: https://issues.apache.org/jira/browse/REEF-1248
             Project: REEF
          Issue Type: Task
            Reporter: Julia


a.	Any transit app error should have retry logic inside code. After retry, if it still fails,
restart server won’t help. 
b.	Any expected app exceptions should be not recoverable
c.	Unexpected app exceptions should be not recoverable

Resource issue
a.	Evaluator is killed by RM. We should response to this case

System Error
a.	System issue causing a machine crash
b.	Other system error we encountered in 10 month data testing, what are the exact events received?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message