hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store
Date Fri, 24 Jan 2014 08:17:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880797#comment-13880797
] 

Jian He commented on YARN-1618:
-------------------------------

bq. is it still the case that RPC servers are started after recovery is complete?
it is.
bq.  The START should come almost immediately after the RMAppImpl object is created in a NEW
state during regular app submission. Karthik, are we sure that this happened?
yes, it is.
bq. There is no need for history for an app that was never submitted successfully to the RM.
I agree. We don't need to save the final state of the app if the app is not even accepted
by the RM.
bq. If we don't want the store to be touched until the app is SUBMITTED/ ACCEPTED (X), we
should probably replace the existing NEW_SAVING state with a corresponding X_SAVING state,
and re-jig the transitions to directly go to KILLED/FAILED from any of the states before this
X_SAVING state.
Regarding the two approaches Karthik proposed. I'm in favor of the 1st one.  

> Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries
in the state-store
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1618
>                 URL: https://issues.apache.org/jira/browse/YARN-1618
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>            Priority: Blocker
>         Attachments: yarn-1618-1.patch
>
>
> YARN-891 augments the RMStateStore to store information on completed applications. In
the process, it adds transitions from NEW to FINAL_SAVING. This leads to the RM trying to
update entries in the state-store that do not exist. On ZKRMStateStore, this leads to the
RM crashing. 
> Previous description:
> ZKRMStateStore fails to handle updates to znodes that don't exist. For instance, this
can happen when an app transitions from NEW to FINAL_SAVING. In these cases, the store should
create the missing znode and handle the update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message