hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
Date Fri, 22 May 2015 12:07:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556037#comment-14556037
] 

Junping Du commented on YARN-41:
--------------------------------

Thanks [~devaraj.k] for updating the patch! Just finish my review.
Two major comments:
{code}
+    // the isStopped check is for avoiding multiple unregistrations.
+    if (this.registeredWithRM && !this.isStopped && !isNMUnderSupervision())
{
+      unRegisterNM();
+    }
{code}
According to discussion above, I think we need to check "yarn.nodemanager.recovery.enabled"
as well. Because even NM is under supervision, if we disable nm recovery, we should continue
to unregister NM to RM.

{code}
      .addTransition(NodeState.RUNNING, NodeState.DECOMMISSIONED,
-         RMNodeEventType.DECOMMISSION,
+         EnumSet.of(RMNodeEventType.DECOMMISSION, RMNodeEventType.SHUTDOWN),
{code}
So, the node after shutdown will become DECOMMISSIONED as final state? I think we don't expect
these nodes show in DECOMMISSIONED list. Isn't it? May be we should have some new NodeState
as SHUTDOWN for this case. This could make changes incompatible, at least for behaviors and
UI. We may need to mark this JIRA as incompatible and document these changes somewhere when
patch is done.

Some minor comments:
Add tests for new PB objects UnRegisterNodeManagerRequestPBImpl, UnRegisterNodeManagerResponsePBImpl
into TestYarnServerApiClasses.java.

{code}
catch (YarnException e) {
+      throw new ServiceException(e);
+    } catch (IOException e) {
+      throw new ServiceException(e);
+    }
{code}
Better to replace with
{code}
catch (YarnException | IOException e) {
    throw new ServiceException(e);
}
{code} 

{code}
+  @Test
+  public void testUnRegisterNodeManager() throws Exception {
+    UnRegisterNodeManagerRequest request = recordFactory
+        .newRecordInstance(UnRegisterNodeManagerRequest.class);
+    assertNotNull(client.unRegisterNodeManager(request));
+
+    ResourceTrackerTestImpl.exception = true;
+    try {
+      client.unRegisterNodeManager(request);
+      fail("there  should be YarnException");
+    } catch (YarnException e) {
+      assertTrue(e.getMessage().startsWith("testMessage"));
+    } finally {
+      ResourceTrackerTestImpl.exception = false;
+    }
+  }
{code}
If other exception get thrown here with wrong message, the test would still pass. Isn't it?
better to catch all exceptions and check if it is YarnException. 

{code}
+  private void unRegisterNM() {
+    RecordFactory recordFactory = RecordFactoryPBImpl.get();
+    UnRegisterNodeManagerRequest request = recordFactory
+        .newRecordInstance(UnRegisterNodeManagerRequest.class);
+    request.setNodeId(this.nodeId);
+    try {
+      resourceTracker.unRegisterNodeManager(request);
+      LOG.info("Successfully Unregistered the Node with ResourceManager");
+    } catch (Exception e) {
+      LOG.warn("Unregistration of Node failed.", e);
+    }
+  }
{code}
Put nodeId in the log could help in trouble shooting, also add the missing period in log.

> The RM should handle the graceful shutdown of the NM.
> -----------------------------------------------------
>
>                 Key: YARN-41
>                 URL: https://issues.apache.org/jira/browse/YARN-41
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Ravi Teja Ch N V
>            Assignee: Devaraj K
>         Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch,
YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch,
YARN-41.patch
>
>
> Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown
gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message