hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
Date Mon, 20 Jul 2015 14:13:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633640#comment-14633640

Jun Gong commented on YARN-3896:

Thanks [~devaraj.k] for the review and comments.

Update a new patch to address your comments. 

There are multiple sleep statements with hard coded values in the newly added test code. Can
you avoid these sleep with hard coded timeouts?
The reason for sleep statements: 1. simulate that RM is busying with dealing with RMNodeEvent
2.wait until event has been processed. Is it reasonable?

> RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
> ---------------------------------------------------------------------------------------
>                 Key: YARN-3896
>                 URL: https://issues.apache.org/jira/browse/YARN-3896
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: YARN-3896.01.patch, YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch,
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved
to /default-rack
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
Reconnect from the node at:
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
NodeManager from node 8041 httpPort: 8080) registered with capability:
<memory:6144, vCores:60, diskCapacity:213>, assigned nodeId
> 2015-07-03 16:49:39,104 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
Too far behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
Deactivating Node as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node( reconnected with RM. When it registered with RM, RM set its
lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's heartbeat come before RM
succeeded setting the id to 0.

This message was sent by Atlassian JIRA

View raw message