hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsuyoshi OZAWA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1367) After restart NM should resync with the RM without killing containers
Date Fri, 16 May 2014 11:23:49 GMT

    [ https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998631#comment-13998631
] 

Tsuyoshi OZAWA commented on YARN-1367:
--------------------------------------

Some comments against a patch:

1. Can you fix the indent?
{code}
+  public boolean isWorkPreservingRestartEnabled() { return
+      isWorkPreservingRestartEnabled;
+  }
{code}

{code}
+          if (!rmWorkPreservingRestartEnbaled)
+          {
+            containerManager.cleanupContainersOnNMResync();
+          }
{code}

2. IMO, "recovery.work-preserving-restart.enabled" is more appropriate because this is one
of options under RECOVERY_ENABLED namespace. 
{code}
  public static final String RM_WORK_PRESERVING_RECOVERY_ENABLED = RM_PREFIX
      + "work-preserving.recovery.enabled";
{code}


> After restart NM should resync with the RM without killing containers
> ---------------------------------------------------------------------
>
>                 Key: YARN-1367
>                 URL: https://issues.apache.org/jira/browse/YARN-1367
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1367.prototype.patch
>
>
> After RM restart, the RM sends a resync response to NMs that heartbeat to it.  Upon receiving
the resync response, the NM kills all containers and re-registers with the RM. The NM should
be changed to not kill the container and instead inform the RM about all currently running
containers including their allocations etc. After the re-register, the NM should send all
pending container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message