hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ying Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery
Date Wed, 08 Feb 2017 04:22:41 GMT

    [ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857371#comment-15857371

Ying Zhang commented on YARN-6031:

Hi [~sunilg], sorry for the late reply (was out for the Spring Festival holiday). Here is
the patch for branch-2.8, please have a look.
I've found a problem with the test case when making the patch for branch-2.8. TestRMRestart
runs all test cases for CapacityScheduler and FairScheduler respectively, and this test case
can only run successfully for CapacityScheduler since it involves running application with
node label specified. On trunk, we don't see this problem because due to YARN-4805, TestRMRestart
now only runs for CapacityScheduler. I've modified the test case a little bit to just run
when it is CapacityScheduler.
  public void testRMRestartAfterNodeLabelDisabled() throws Exception {
    // Skip this test case if it is not CapacityScheduler since NodeLabel is
    // not fully supported yet for FairScheduler and others.
    if (!getSchedulerType().equals(SchedulerType.CAPACITY)) {
We should probably make this change to trunk too. Let me know you want to make the change
through this JIRA, or I need to open another JIRA to address it?

> Application recovery has failed when node label feature is turned off during RM recovery
> ----------------------------------------------------------------------------------------
>                 Key: YARN-6031
>                 URL: https://issues.apache.org/jira/browse/YARN-6031
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 2.8.0
>            Reporter: Ying Zhang
>            Assignee: Ying Zhang
>            Priority: Minor
>         Attachments: YARN-6031.001.patch, YARN-6031.002.patch, YARN-6031.003.patch, YARN-6031.004.patch,
YARN-6031.005.patch, YARN-6031.006.patch, YARN-6031.007.patch
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid
resource request, node label not enabled but request contains label expression
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had node label
expression specified while node label has been disabled.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message