hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Templeton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6031) Application recovery failed after disabling node label
Date Wed, 28 Dec 2016 14:25:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782984#comment-15782984
] 

Daniel Templeton commented on YARN-6031:
----------------------------------------

Yep, tests are needed.  Love the long explanatory comment.  Do you think we can make the log
message a bit more explicit, i.e. say that the failure was because node labels have been disabled
and point out the property that the admin should use to disable/enable node labels?

Also, what happens to the app in the state store?  If we fail to recover it and just ignore
it, it will sit there forever, I suspect.  It's probably a bad thing if the RM and state store
don't agree on what apps are active.

> Application recovery failed after disabling node label
> ------------------------------------------------------
>
>                 Key: YARN-6031
>                 URL: https://issues.apache.org/jira/browse/YARN-6031
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 2.8.0
>            Reporter: Ying Zhang
>            Assignee: Ying Zhang
>            Priority: Minor
>         Attachments: YARN-6031.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid
resource request, node label not enabled but request contains label expression
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had node label
expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message