hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
Date Wed, 04 Sep 2013 15:03:54 GMT

    [ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757839#comment-13757839
] 

Jason Lowe commented on YARN-540:
---------------------------------

Sorry for arriving late, but why wouldn't we want to implement choice (1) above?  (i.e.: block
until store confirms app state is removed).  From an AM's perspective, that's the simplest
solution.  Returning control to the AM early from the unregister is inviting the AM to do
bad things wrt. a potential restart (e.g.: MR AM will remove its staging directory, effectively
preventing the restart from succeeding and leading the RM to believe the app failed).  The
unregister call is a terminal call in the AM-RM protocol, so I think it's appropriate for
that to not return until the app truly is unregistered.
                
> Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-540
>                 URL: https://issues.apache.org/jira/browse/YARN-540
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.patch, YARN-540.patch
>
>
> When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher
is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload
the existing state files even though the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message