hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3034) NM should act on a REBOOT command from RM
Date Fri, 03 Feb 2012 16:55:55 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199854#comment-13199854
] 

Robert Joseph Evans commented on MAPREDUCE-3034:
------------------------------------------------

I looked through the patch and I have a few comments.

Overall the patch looks very good.

In TestNodeStatusUpdater.java there are some tests that wait for a specific state
{code}
while(null == rebootedNodeManager){
  Thread.sleep(1000);
}
{code}
In later parts of the test you log that you are waiting and have an upper limit on how long
you wait for.  Please add in at least an upper limit on how long this loop will wait.  I don't
want to have a test with a live lock issue if someone breaks it so that state is never reached.

And a few minor nits: I don't really like the name initNStartNodeManager, I would prefer to
have it initAndStartNodeManager.  Also in that method I personally think that   "if (!isRebooted)"
is more readable then "if (false == isRebooted)"
                
> NM should act on a REBOOT command from RM
> -----------------------------------------
>
>                 Key: MAPREDUCE-3034
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3034
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Devaraj K
>            Priority: Critical
>         Attachments: MAPREDUCE-3034-1.patch, MAPREDUCE-3034-2.patch, MAPREDUCE-3034.patch,
MR-3034.txt
>
>
> RM sends a reboot command to NM in some cases, like when it gets lost and rejoins back.
In such a case, NM should act on the command and reboot/reinitalize itself.
> This is akin to TT reinitialize on order from JT. We will need to shutdown all the services
properly and reinitialize - this should automatically take care of killing of containers,
cleaning up local temporary files etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message