hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bibin A Chundatt (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration
Date Sun, 12 Jul 2015 18:12:05 GMT

     [ https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bibin A Chundatt updated YARN-3894:
-----------------------------------
    Attachment: 0002-YARN-3894.patch

Hi [~leftnoteasy] patch for review after handling comments
After fix
{code}
2015-07-12 23:37:51,101 INFO  [Thread-2] service.AbstractService (AbstractService.java:noteFailure(272))
- Service ResourceManager failed in state INITED; cause: java.lang.IllegalArgumentException:
Illegal capacity of 0.7 for children of queue root for label=z
java.lang.IllegalArgumentException: Illegal capacity of 0.7 for children of queue root for
label=z
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:490)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:319)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:349)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:559)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:254)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.yarn.server.resourcemanager.MockRM.<init>(MockRM.java:119)
	at org.apache.hadoop.yarn.server.resourcemanager.MockRM.<init>(MockRM.java:112)
	at org.apache.hadoop.yarn.server.resourcemanager.MockRM.<init>(MockRM.java:108)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueParsing$1.<init>(TestQueueParsing.java:929)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueParsing.testRMStartWrongNodeCapacity(TestQueueParsing.java:929)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:19)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -------------------------------------------------------------------------
>
>                 Key: YARN-3894
>                 URL: https://issues.apache.org/jira/browse/YARN-3894
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM will shutdown, but not incase of NodeLabels capacity mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
>     throws IOException {   
>     root = 
>         parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
>             queues, queues, noop);
>     labelManager.reinitializeQueueLabels(getQueueToLabels());
>     root = 
>         parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
>             queues, queues, noop);
>     LOG.info("Initialized root queue " + root);
>     initializeQueueMappings();
>     setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level capacity
mismatch happens in {{parseQueue}} . So during initialization {{parseQueue}} the labels will
be empty . 
> *Steps to reproduce*
> # Configure RM with capacity scheduler
> # Add one or two node label from rmadmin
> # Configure capacity xml with nodelabel but issue with capacity configuration for already
added label
> # Restart both RM
> # Check on service init of capacity scheduler node label list is populated 
> *Expected*
> RM should not start 
> *Current exception on reintialize check*
> {code}
> 2015-07-07 19:18:25,655 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, usedResources=<memory:0,
vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
> 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.AdminService:
Exception refresh queues.
> java.io.IOException: Failed to re-init queues
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
>         at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
>         at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
>         at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for children of
queue root for label=node2
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
>         ... 8 more
> 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=dsperf   OPERATION=refreshQueues TARGET=AdminService     RESULT=FAILURE  DESCRIPTION=Exception
refresh queues.   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=dsperf   OPERATION=transitionToActive    TARGET=RMHAProtocolService      RESULT=FAILURE
 DESCRIPTION=Exception transitioning to active   PERMISSIONS=
> 2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling
the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>         at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
>         at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
>         at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active
mode
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321)
>         at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
>         ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.IOException: Failed to
re-init queues
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:617)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
>         ... 5 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message