hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Riccomini (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1
Date Thu, 02 May 2013 17:52:15 GMT

    [ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647750#comment-13647750
] 

Chris Riccomini commented on YARN-614:
--------------------------------------

Added a new patch. Resolves 1 (switch justFinishedContainers to a map for O(1) container status
look) and 3 (added a shouldIgnoreFailures method) in my list above.

Bikas, I think we should leave recovery for another ticket.

Do you want me to update RMAppManager.recover() to have the same "if (app.attempts.size()
- app.ignoredFailures >= app.maxAppAttempts)" logic as RMAppImpl.AttemptFailedTransition?
                
> Retry attempts automatically for hardware failures or YARN issues and set default app
retries to 1
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-614
>                 URL: https://issues.apache.org/jira/browse/YARN-614
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Bikas Saha
>         Attachments: YARN-614-0.patch, YARN-614-1.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be retried
unnecessarily. The only reason YARN should retry an attempt is when the hardware fails or
YARN has an error. NM failing, lost NM and NM disk errors are the hardware errors that come
to mind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message