hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real
Date Tue, 17 Sep 2013 21:07:53 GMT
Vinod Kumar Vavilapalli created YARN-1210:
---------------------------------------------

             Summary: During RM restart, RM should start a new attempt only when previous
attempt exits for real
                 Key: YARN-1210
                 URL: https://issues.apache.org/jira/browse/YARN-1210
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Vinod Kumar Vavilapalli
            Assignee: Vinod Kumar Vavilapalli


When RM recovers, it can wait for existing AMs to contact RM back and then kill them forcefully
before even starting a new AM. Worst case, RM will start a new AppAttempt after waiting for
10 mins ( the expiry interval). This way we'll minimize multiple AMs racing with each other.
This can help issues with downstream components like Pig, Hive and Oozie during RM restart.

In the mean while, new apps will proceed as usual as existing apps wait for recovery.

This can continue to be useful after work-preserving restart, so that AMs which can properly
sync back up with RM can continue to run and those that don't are guaranteed to be killed
before starting a new attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message