hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj K (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2775) [MR-279] Decommissioned node does not shutdown
Date Mon, 24 Oct 2011 06:10:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133844#comment-13133844

Devaraj K commented on MAPREDUCE-2775:

Hi Vinod and Arun, Thanks for the comments,

Shall we rename NodeAction.DECOMMISSION to SHUTDOWN}?{quote}

This would be a good idea. We can generalize it.

Need to send a SHUTDOWN command to the nodes even if it is invalid at the RM at the time of
registration. This is a very common case, we exclude the node even before we start the cluster.
Please also add a test for the above.{quote}
It is already shutting down if it is excluded node by throwing IOException. This can be done
by sending the shutdown command as part of RegisterNodeManagerResponse instead of IOException.

If the node is not valid, the correct component to send a RMNodeEventType.DECOMMISSION event
to RMNode is NodeListManager. We can move this code out of ResourceTrackerService into NodeListManager.refreshNodes()
- sending events to all nodes that get decomissioned during refreshNodes(). This will also
ensure that the decomissioned-node-count gets incremented immediately instead of waiting for
all the nodes to reach RM. Your tests in TestResourceTrackerService also simplify a bit.
I agree with Aruns comments on this, and in this case, we either need to establish a new communication
between RM and NM other than heartbeat for generating events.


TestNodeStatusUpdater: The two second sleeps are error prone. I think it should simply wait
till heartBeatID becomes more than 3 or a timeout
Similarly in TestNMExpiry, you should spin around till lost-nodes' count becomes two or a
timeout happens.
TestResourceTrackerService is good work! 
checkDecommissionedNMCount(): Again spin till the correct count or a timeout occurs.

Yes, I will address this problem.
> [MR-279] Decommissioned node does not shutdown
> ----------------------------------------------
>                 Key: MAPREDUCE-2775
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2775
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Ramya Sunil
>            Assignee: Devaraj K
>            Priority: Blocker
>             Fix For: 0.23.0
>         Attachments: MAPREDUCE-2775-1.patch, MAPREDUCE-2775-2.patch, MAPREDUCE-2775-3.patch,
MAPREDUCE-2775-4.patch, MAPREDUCE-2775.patch, MAPREDUCE-2775.patch
> A Nodemanager which is decommissioned by an admin via refreshnodes does not automatically

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message