hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anubhav Dhoot (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-556) RM Restart phase 2 - Work preserving restart
Date Fri, 01 Aug 2014 18:09:44 GMT

     [ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Anubhav Dhoot updated YARN-556:

    Attachment: YARN-1372.prelim.patch

NM does not remove completedContainers from its list until RM sends a new field in the nodeheartbeatresponse
which tracks containerCompletions acked by the AM.
RM AppAttempt tracks completed container to nodeid, This is sents to AM and after AM sends
the next allocate its assumed to implicitly ack the previous , RMNode gets a new event to
process this ack and send it to NM via the heartbeatresponse completing the cycle.

> RM Restart phase 2 - Work preserving restart
> --------------------------------------------
>                 Key: YARN-556
>                 URL: https://issues.apache.org/jira/browse/YARN-556
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: Work Preserving RM Restart.pdf, WorkPreservingRestartPrototype.001.patch,
> YARN-128 covered storing the state needed for the RM to recover critical information.
This umbrella jira will track changes needed to recover the running state of the cluster so
that work can be preserved across RM restarts.

This message was sent by Atlassian JIRA

View raw message