hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mayank Bansal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event
Date Mon, 10 Jun 2013 22:20:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679982#comment-13679982
] 

Mayank Bansal commented on YARN-502:
------------------------------------

By Looking at the code looks like if there is race condition between ReconnectNodeTransition
and UnhealthyTrabsntion in event dispatcher 

This condition may arrise when Nodemanager tries to register itself and ResourceTrackerService
puts this node in the Nodes list and schedule the event for recoonect however in the mean
time there is an unhealthy event come first to RM and it deletes this Node from the Nodes
map.

Thanks,
Mayank
                
> RM crash with NPE on NODE_REMOVED event
> ---------------------------------------
>
>                 Key: YARN-502
>                 URL: https://issues.apache.org/jira/browse/YARN-502
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.0.3-alpha
>            Reporter: Lohit Vijayarenu
>            Assignee: Mayank Bansal
>
> While running some test and adding/removing nodes, we see RM crashed with the below exception.
We are testing with fair scheduler and running hadoop-2.0.3-alpha
> {noformat}
> 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
Deactivating Node YYYY:55680 as it is now LOST
> 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
YYYY:55680 Node Transitioned from UNHEALTHY to LOST
> 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Error in handling event type NODE_REMOVED to the scheduler
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Exiting, bbye..
> 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@XXXX:50030
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message