hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4144) ResourceManager NPE while handling NODE_UPDATE
Date Tue, 17 Apr 2012 15:34:21 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255649#comment-13255649
] 

Jason Lowe commented on MAPREDUCE-4144:
---------------------------------------

Is the concern that with this change we won't remove the reservation or NODE_LOCAL request?
 This could still have happened in the case where the node doesn't free up sufficient resources
before the application ends up finishing with containers on other nodes.  Assuming the app
doesn't complete first, I think the reservation will be cleaned up in assignReservedContainer()
either because there are no more outstanding requests at the same priority or it will fill
the reservation with an ANY request (since we know there aren't any more RACK_LOCAL requests
in this scenario).

But I might be misreading the code.  If it's critical to allocate the reserved container as
NODE_LOCAL once the node has enough free resources, we can undo this fix and put the rackLocal
null check in AppSchedulingInfo.allocateNodeLocal.
                
> ResourceManager NPE while handling NODE_UPDATE
> ----------------------------------------------
>
>                 Key: MAPREDUCE-4144
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4144
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3
>
>         Attachments: MAPREDUCE-4144-testcase.patch, MAPREDUCE-4144.patch
>
>
> The RM on one of our clusters has exited twice in the past few days because of an NPE
while trying to handle a NODE_UPDATE:
> {noformat}
> 2012-04-12 02:09:01,672 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Error in handling event type NODE_UPDATE to the scheduler
>  [ResourceManager Event Processor]java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:261)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:223)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.allocate(SchedulerApp.java:246)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1229)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignNodeLocalContainers(LeafQueue.java:1078)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1048)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:859)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:756)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:573)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:622)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:78)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:302)
>         at java.lang.Thread.run(Thread.java:619)
> {noformat}
> This is very similar to the failure reported in MAPREDUCE-3005.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message