hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
Date Thu, 15 Aug 2013 00:20:48 GMT

    [ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740474#comment-13740474

Bikas Saha commented on YARN-1055:

Why does the launcher not retry the action? Is there a jira in OOZIE to make it work properly
in such cases by doing its own book-keeping? Isnt it more correct to fix OOZIE instead of
adding a workaround config in YARN?
Is the current situation acceptable as a known short term bug? From what I see nothing wrong
will happen functionally/practically. In infrequent cases of the action-AM node crashing,
the pipeline would have to be restarted. We have a design for work-preserving RM restart that
can be completed post beta. This will remove the need to restart AM's. Given that, I am really
averse to adding a short term work around API in AppSubmissionContext that will have to be
maintained till YARN-3.0 comes out because we are guaranteeing API's post beta.
> Handle app recovery differently for AM failures and RM restart
> --------------------------------------------------------------
>                 Key: YARN-1055
>                 URL: https://issues.apache.org/jira/browse/YARN-1055
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Karthik Kambatla
> Ideally, we would like to tolerate container, AM, RM failures. App recovery for AM and
RM currently relies on the max-attempts config; tolerating AM failures requires it to be >
1 and tolerating RM failure/restart requires it to be = 1.
> We should handle these two differently, with two separate configs.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message