hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1367) After restart NM should resync with the RM without killing containers
Date Fri, 20 Jun 2014 18:35:26 GMT

    [ https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039150#comment-14039150
] 

Jian He commented on YARN-1367:
-------------------------------

Thanks for working the patch, some comments:

- can you fix the javac warning ?
- we should have ResourceTrackerService change in this patch also to send resync on non-work-preserving
case and resnc_keeping_containers in work-preserving case ?
-  These two loggings can be consolidated as one.
{code}
              LOG.warn("Node is out of sync with ResourceManager,"
                  + " hence resyncing. ShouldKeepContainers set to " +
                  keepContainers);
              LOG.warn("Message from ResourceManager: "
                  + response.getDiagnosticsMessage());
{code}
- code should be cleaner if using separate if case ?
{code}
            if (response.getNodeAction() == NodeAction.RESYNC || response
                .getNodeAction() == NodeAction.RESYNC_KEEPING_CONTAINERS) {
              boolean keepContainers = response.getNodeAction() == NodeAction
                  .RESYNC_KEEPING_CONTAINERS;
{code}
- Test case: we can explicitly check the container is indeed the container started before.
{code}
 if (containersShouldBePreserved) {
              Assert.assertFalse(containers.isEmpty());
{code}

some cosmetic suggestions :

- testPreserveContainersOnResyncKeepingContainers -> testKeepContainersOnResync
- we should rename testContainerPreservationOnResyncImpl also as it’s used in both killContainer
and keepContainer test case
- “// ensure that containers are empty before restart nodeStatusUpdater” this comment
is put before containersShouldBePreserved check which is confusing.
- blank line between two comments
{code}
      // Only containers should be killed on resync, apps should lie around. That

      // way local resources for apps can be used beyond resync without
      // relocalization
{code}
- In NodeManager.java, the convention should be to have the first  bracket follow the if statement.
{code}
          if (!keepContainers)
          {
            LOG.info("Cleaning up running containers on resync");
{code}
- can you remove those unused “@SuppressWarnings("unchecked”)”, there are a lot of them
after refactoring 
the following two lines can be in one line
- {code}
      nm.getNMDispatcher().getEventHandler().
          handle(resyncEvent);
{code}
- Maybe have two global variables for both commands, instead of instantiating them in each
test case?
{code}
   NodeManagerEvent resyncKeepingContainersEvent =
        new NodeManagerEvent(NodeManagerEventType.RESYNC_KEEPING_CONTAINERS);
NodeManagerEvent resyncEvent = new NodeManagerEvent(NodeManagerEventType.RESYNC);
{code}

> After restart NM should resync with the RM without killing containers
> ---------------------------------------------------------------------
>
>                 Key: YARN-1367
>                 URL: https://issues.apache.org/jira/browse/YARN-1367
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1367.001.patch, YARN-1367.prototype.patch
>
>
> After RM restart, the RM sends a resync response to NMs that heartbeat to it.  Upon receiving
the resync response, the NM kills all containers and re-registers with the RM. The NM should
be changed to not kill the container and instead inform the RM about all currently running
containers including their allocations etc. After the re-register, the NM should send all
pending container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message