hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Gong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing
Date Wed, 23 Dec 2015 03:40:46 GMT
Jun Gong created YARN-4497:
------------------------------

             Summary: RM might fail to restart when recovering apps whose attempts are missing
                 Key: YARN-4497
                 URL: https://issues.apache.org/jira/browse/YARN-4497
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Jun Gong
            Assignee: Jun Gong


Find following problem when discussing in YARN-3480.

If RM fails to store some attempts in RMStateStore, there will be missing attempts in RMStateStore,
for the case storing attempt1, attempt2 and attempt3, RM successfully stored attempt1 and
attempt3, but failed to store attempt2. When RM restarts, in *RMAppImpl#recover*, we recover
attempts one by one, for this case, we will recover attmept1, then attempt2. When recovering
attempt2, we call  *((RMAppAttemptImpl)this.currentAttempt).recover(state)*,
 it will first find its ApplicationAttemptStateData, but it could not find it, an error will
come at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message