hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed
Date Thu, 16 Apr 2015 01:10:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497413#comment-14497413
] 

Rohith commented on YARN-3493:
------------------------------

The same problem would occur enabling RM work preserving restart where Running AM updates
its ResourceRequest on RESYNC command from RM. This causes throw InvalidResourceRequestException
to AM which AM do not expect it.

> RM fails to come up with error "Failed to load/recover state" when  mem settings are
changed
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-3493
>                 URL: https://issues.apache.org/jira/browse/YARN-3493
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.7.0
>            Reporter: Sumana Sathish
>            Assignee: Jian He
>            Priority: Critical
>         Attachments: YARN-3493.1.patch, YARN-3493.2.patch, yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb
to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in background and wait
for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 before
the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request,
requested memory < 0, or requested memory > max configured, requestedMemory=3072, maxMemory=2048
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(579))
- Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request,
requested memory < 0, or requested memory > max configured, requestedMemory=3072, maxMemory=2048
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,624 INFO  service.AbstractService (AbstractService.java:noteFailure(272))
- Service RMActiveServices failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException:
Invalid resource request, requested memory < 0, or requested memory > max configured,
requestedMemory=3072, maxMemory=2048
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request,
requested memory < 0, or requested memory > max configured, requestedMemory=3072, maxMemory=2048
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
>         at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,625 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(211))
- Stopping ResourceManager metrics system...
> 2015-04-15 13:19:18,626 INFO  impl.MetricsSinkAdapter (MetricsSinkAdapter.java:publishMetricsFromQueue(141))
- timeline thread interrupted.
> 2015-04-15 13:19:18,626 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(217))
- ResourceManager metrics system stopped.
> 2015-04-15 13:19:18,627 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(606))
- ResourceManager metrics system shutdown complete.
> 2015-04-15 13:19:18,627 INFO  event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(140))
- AsyncDispatcher is draining to stop, igonring any new events.
> 2015-04-15 13:19:18,633 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session:
0x44cbc922670001c closed
> 2015-04-15 13:19:18,633 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread
shut down
> 2015-04-15 13:19:18,634 INFO  event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(140))
- AsyncDispatcher is draining to stop, igonring any new events.
> 2015-04-15 13:19:18,634 INFO  service.AbstractService (AbstractService.java:noteFailure(272))
- Service Dispatcher failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:142)
>         at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>         at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>         at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>         at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>         at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>         at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>         at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>         at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>         at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>         at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>         at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>         at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>         at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>         at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:601)
>         at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>         at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>         at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message