hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wilfred Spiegelenburg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5136) Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
Date Tue, 15 Nov 2016 23:07:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668678#comment-15668678
] 

Wilfred Spiegelenburg commented on YARN-5136:
---------------------------------------------

I was thrown of track a bit with all the changes that were made to the locking in the scheduler
in YARN-3139.

After analysis it shows that the issue is not resolved yet and we have two situations that
can cause a the above mentioned problem:
# if a call for a {{removeApplicationAttempt}} and a {{moveApplication}} for the same attempt
are processed in that order in short succession the application attempt will still contain
a queue reference but is already removed from the list of applications for the queue
# if two calls to {{removeApplicationAttempt}} come in in short succession the application
will still contain a queue reference but is already removed from the list of applications
for the queue

In both cases the 2nd call must come in before the {{removeApplication}} call is made.

> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> -----------------------------------------------------------------
>
>                 Key: YARN-5136
>                 URL: https://issues.apache.org/jira/browse/YARN-5136
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: tangshangwen
>            Assignee: Wilfred Spiegelenburg
>
> move app cause rm exit
> {noformat}
> 2016-05-24 23:20:47,202 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@ea94c3b
does not exist in queue [root.bdp_xx.bdp_mart_xx_formal, demand=<memory:28672000, vCores:14000>,
running=<memory:28647424, vCores:13422>, share=<memory:28672000, vCores:0>, w=<memory
weight=1.0, cpu weight=1.0>]
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:119)
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:779)
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1231)
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:114)
>     at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680)
>     at java.lang.Thread.run(Thread.java:745)
> 2016-05-24 23:20:47,202 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e04_1464073905025_15410_01_001759 Container Transitioned from ACQUIRED to RELEASED
> 2016-05-24 23:20:47,202 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message