hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing
Date Wed, 23 Dec 2015 10:19:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069470#comment-15069470
] 

Rohith Sharma K S commented on YARN-4497:
-----------------------------------------

Currently, If any errors happened while storing into RMstateStore then RMStatestore is FENCED.
So no more attempts are stored in state-store. And the RMStatState store state machine has
transition is only from {{ACTIVE to FENCED}} but there is No {{FENCED to ACTIVE}}. 

If I am missing anything in flow, could you explain elaborately? 

> RM might fail to restart when recovering apps whose attempts are missing
> ------------------------------------------------------------------------
>
>                 Key: YARN-4497
>                 URL: https://issues.apache.org/jira/browse/YARN-4497
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing attempts in
RMStateStore, for the case storing attempt1, attempt2 and attempt3, RM successfully stored
attempt1 and attempt3, but failed to store attempt2. When RM restarts, in *RMAppImpl#recover*,
we recover attempts one by one, for this case, we will recover attmept1, then attempt2. When
recovering attempt2, we call  *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it
will first find its ApplicationAttemptStateData, but it could not find it, an error will come
at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message