hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Kanter (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-7382) NoSuchElementException in FairScheduler after failover causes RM crash
Date Mon, 23 Oct 2017 23:38:00 GMT
Robert Kanter created YARN-7382:
-----------------------------------

             Summary: NoSuchElementException in FairScheduler after failover causes RM crash
                 Key: YARN-7382
                 URL: https://issues.apache.org/jira/browse/YARN-7382
             Project: Hadoop YARN
          Issue Type: Bug
          Components: fairscheduler
    Affects Versions: 2.9.0, 3.0.0
            Reporter: Robert Kanter
            Assignee: Robert Kanter
            Priority: Blocker


While running an MR job (e.g. sleep) and an RM failover occurs, once the maps gets to 100%,
the now active RM will crash due to:
{noformat}
2017-10-18 15:02:05,347 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1508361403235_0001_01_000002 Container Transitioned from RUNNING to COMPLETED
2017-10-18 15:02:05,347 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=systest  OPERATION=AM Released Container TARGET=SchedulerApp     RESULT=SUCCESS  APPID=application_1508361403235_0001
   CONTAINERID=container_1508361403235_0001_01_000002      RESOURCE=<memory:1024, vCores:1>
2017-10-18 15:02:05,349 FATAL org.apache.hadoop.yarn.event.EventDispatcher: Error in handling
event type NODE_UPDATE to the Event Dispatcher
java.util.NoSuchElementException
        at java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
        at java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:371)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:901)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1326)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:371)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:221)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:221)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1019)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:887)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1104)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:128)
        at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
        at java.lang.Thread.run(Thread.java:748)
2017-10-18 15:02:05,360 INFO org.apache.hadoop.yarn.event.EventDispatcher: Exiting, bbye..
{noformat}
This leaves the cluster with no RMs!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message