hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
Date Mon, 15 Dec 2014 23:20:14 GMT

    [ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247415#comment-14247415
] 

Jian He commented on YARN-2340:
-------------------------------

Today, the semantics to stop a queue is to let the existing applications run into completion.
We should retain the same semantics for RM restart as well. In this case, I think we need
to ignore this exception and continue because the application was accepted before the queue
is changed to stopped. Similar problem could happen if we change the application acl and restart
RM while application is running. 

> NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's
and remain in standby
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2340
>                 URL: https://issues.apache.org/jira/browse/YARN-2340
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager, scheduler
>    Affects Versions: 2.4.1
>         Environment: Capacityscheduler with Queue a, b
>            Reporter: Nishan Shetty
>            Assignee: Rohith
>            Priority: Critical
>
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1406116264351_0014_000002 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Error in handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message