hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramya Sunil (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3272) Lost NMs fail to rejoin
Date Wed, 30 Nov 2011 18:43:42 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160225#comment-13160225

Ramya Sunil commented on MAPREDUCE-3272:

Jonathan, the issue which I described was when the nodemanagers are lost and not killed i.e.
kill -SIGSTOP 4029 and not kill 4029. This would suspend the nodemanager momentarily but not
kill the process. To simulate rejoining of nodemanagers, I did not restart the nodemanager
but used kill -SIGCONT 4029 to resume execution. 
> Lost NMs fail to rejoin
> -----------------------
>                 Key: MAPREDUCE-3272
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3272
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Ramya Sunil
>            Assignee: Jonathan Eagles
>            Priority: Critical
>             Fix For: 0.23.1
> Lost nodemanagers fail to join back. 
> When the NM is lost, RM log reads
> {noformat}
> INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:<host:port>
Timed out after 600 secs
> INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing <host:port>
of type EXPIRE
> INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Removed Node <host:port>
> INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: <host:port>
Node Transitioned from RUNNING to LOST
> {noformat}
> When the NM joins back, RM log reads
> {noformat}
> INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node not found
rebooting <host:port>
> {noformat}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message