hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj K (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2775) [MR-279] Decommissioned node does not shutdown
Date Mon, 24 Oct 2011 06:10:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133844#comment-13133844
] 

Devaraj K commented on MAPREDUCE-2775:
--------------------------------------

Hi Vinod and Arun, Thanks for the comments,

{quote}
Shall we rename NodeAction.DECOMMISSION to SHUTDOWN}?{quote}

This would be a good idea. We can generalize it.

{quote}
Need to send a SHUTDOWN command to the nodes even if it is invalid at the RM at the time of
registration. This is a very common case, we exclude the node even before we start the cluster.
Please also add a test for the above.{quote}
It is already shutting down if it is excluded node by throwing IOException. This can be done
by sending the shutdown command as part of RegisterNodeManagerResponse instead of IOException.



{quote}
If the node is not valid, the correct component to send a RMNodeEventType.DECOMMISSION event
to RMNode is NodeListManager. We can move this code out of ResourceTrackerService into NodeListManager.refreshNodes()
- sending events to all nodes that get decomissioned during refreshNodes(). This will also
ensure that the decomissioned-node-count gets incremented immediately instead of waiting for
all the nodes to reach RM. Your tests in TestResourceTrackerService also simplify a bit.
{quote}
I agree with Aruns comments on this, and in this case, we either need to establish a new communication
between RM and NM other than heartbeat for generating events.

{quote}

TestNodeStatusUpdater: The two second sleeps are error prone. I think it should simply wait
till heartBeatID becomes more than 3 or a timeout
Similarly in TestNMExpiry, you should spin around till lost-nodes' count becomes two or a
timeout happens.
TestResourceTrackerService is good work! 
checkDecommissionedNMCount(): Again spin till the correct count or a timeout occurs.
{quote} 

Yes, I will address this problem.
                
> [MR-279] Decommissioned node does not shutdown
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2775
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2775
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Ramya Sunil
>            Assignee: Devaraj K
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2775-1.patch, MAPREDUCE-2775-2.patch, MAPREDUCE-2775-3.patch,
MAPREDUCE-2775-4.patch, MAPREDUCE-2775.patch, MAPREDUCE-2775.patch
>
>
> A Nodemanager which is decommissioned by an admin via refreshnodes does not automatically
shutdown. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message