hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-1354) Recover applications upon nodemanager restart
Date Tue, 15 Apr 2014 15:32:15 GMT

     [ https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe updated YARN-1354:

    Attachment: YARN-1354-v1.patch

Patch that persists applications to a leveldb state store when recovery is enabled.  This
patch also addresses YARN-1355 because app acls are persisted as part of the app details.

The review for MAPREDUCE-5652 noted a potential issue with application completion events being
lost as the NM goes down, and one way to mitigate that would be sending the list of active
applications to the RM when the NM registers.  Then the RM can update the NM with any finished
applications on the response or the next NM heartbeat.  That's not yet addressed with this
initial patch, as I wanted to keep the patch size manageable and get some initial feedback.
 After the feedback we can decide whether to address that corner case as part of this change
or in a followup JIRA.

> Recover applications upon nodemanager restart
> ---------------------------------------------
>                 Key: YARN-1354
>                 URL: https://issues.apache.org/jira/browse/YARN-1354
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1354-v1.patch
> The set of active applications in the nodemanager context need to be recovered for work-preserving
nodemanager restart

This message was sent by Atlassian JIRA

View raw message