hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
Date Mon, 09 Feb 2015 13:38:35 GMT

    [ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312225#comment-14312225

Junping Du commented on YARN-41:

I think I could have a little misunderstand before. After checking again, your patch is actually
working on decommission node, not "shutdown" (let's define call yarn daemon stop or kill -9
on NodeManager as shutdown, just for get rid of any confusion), so the patch here shouldn't
affect the work on YARN-1336 (containers can still be running after "shutdown" NM, which is
different from decommission).
>From what I am understanding, now the new flow in your current patch is: when user decommission
a Node, the RM heartbeat back to NM with a SHUTDOWN message, NM prepare service stop and send
a unRegister message to RM (via RPC call) again before it killing itself and RM (ResourceTrackerService)
try to do some cleanup work. 
IMO, there are several concerns with this approach:
1.  Another round of RPC between (NM and RM) is unnecessary, RM could do the same thing (code
in unRegisterNodeManager()) during sending SHUTDOWN message back.
2. Some work is already being covered (like sending DECOMMISSION event to RMNode) in NodeListManager
when doing decommission (refresh) node operation. It seems new work in unRegisterNodeManager()
only be unregister in NMLivenessMonitor.
Am I missing anything?

> The RM should handle the graceful shutdown of the NM.
> -----------------------------------------------------
>                 Key: YARN-41
>                 URL: https://issues.apache.org/jira/browse/YARN-41
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Ravi Teja Ch N V
>            Assignee: Devaraj K
>         Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch,
YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41.patch
> Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown

This message was sent by Atlassian JIRA

View raw message