hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amol Kekre (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-2693) NPE in AM causes it to lose containers which are never returned back to RM
Date Fri, 15 Jul 2011 22:34:01 GMT
NPE in AM causes it to lose containers which are never returned back to RM
--------------------------------------------------------------------------

                 Key: MAPREDUCE-2693
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2693
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
            Reporter: Amol Kekre
            Priority: Critical
             Fix For: 0.23.0


The following exception in AM of an application at the top of queue causes this. Once this
happens, AM keeps obtaining
containers from RM and simply loses them. Eventually on a cluster with multiple jobs, no more
scheduling happens
because of these lost containers.

It happens when there are blacklisted nodes at the app level in AM. A bug in AM
(RMContainerRequestor.containerFailedOnHost(hostName)) is causing this - nodes are simply
getting removed from the
request-table. We should make sure RM also knows about this update.

========================================================================
11/06/17 06:11:18 INFO rm.RMContainerAllocator: Assigned based on host match 98.138.163.34
11/06/17 06:11:18 INFO rm.RMContainerRequestor: BEFORE decResourceRequest: applicationId=30
priority=20
resourceName=... numContainers=4978 #asks=5
11/06/17 06:11:18 INFO rm.RMContainerRequestor: AFTER decResourceRequest: applicationId=30
priority=20
resourceName=... numContainers=4977 #asks=5
11/06/17 06:11:18 INFO rm.RMContainerRequestor: BEFORE decResourceRequest: applicationId=30
priority=20
resourceName=... numContainers=1540 #asks=5
11/06/17 06:11:18 INFO rm.RMContainerRequestor: AFTER decResourceRequest: applicationId=30
priority=20
resourceName=... numContainers=1539 #asks=6
11/06/17 06:11:18 ERROR rm.RMContainerAllocator: ERROR IN CONTACTING RM. 
java.lang.NullPointerException
        at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.decResourceRequest(RMContainerRequestor.java:246)
        at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.decContainerReq(RMContainerRequestor.java:198)
        at
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:523)
        at
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$200(RMContainerAllocator.java:433)
        at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:151)
        at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:220)
        at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message