hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-128) Resurrect RM Restart
Date Tue, 25 Sep 2012 02:41:09 GMT

    [ https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462375#comment-13462375

Thomas Graves commented on YARN-128:

    What about AM's that completed during restart. Re-running them should be a no-op.
AMs should not finish themselves while the RM is down or recovering. They should just spin.
Doesn't the RM still need to handle this.  The client could stop the AM at any point by talking
directly to it.  Or since anyone can write an AM it could simply finish on its own. Or perhaps
timing issue on app finish. How does the RM tell the difference?  We can have the MR client/AM
handle this nicely but even then there could be a bug or expiry after so long.  So perhaps
if the AM is down it doesn't get restarted?  Thats probably not ideal if app happens to go
down at the same time as the RM though - like a rack gets rebooted or something, but otherwise
you have to handle all the restart issues, like Bobby mentioned above.

> Resurrect RM Restart 
> ---------------------
>                 Key: YARN-128
>                 URL: https://issues.apache.org/jira/browse/YARN-128
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt
> We should resurrect 'RM Restart' which we disabled sometime during the RM refactor.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message