hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
Date Thu, 05 Jan 2017 04:11:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800249#comment-15800249
] 

Rohith Sharma K S commented on YARN-6009:
-----------------------------------------

bq. What is the impact of allowing an app to be recovered with a 0 timeout value?
No impact, just by passing to make service up. 

bq. Is that going to cause the recovered app to immediately expire?
Noting happens to app. App will NOT be monitored for timeout. Moreover, App with 0 timeout
was not monitored before RM restart also.

bq. Also, you may as well do the validation before the queue mapping.
Both queue mapping and timeout check are pre-validation process. The order of preference should
not matter.

> RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid
application timeout, value=0 for type=LIFETIME)
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6009
>                 URL: https://issues.apache.org/jira/browse/YARN-6009
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Gour Saha
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler (CapacityScheduler.java:initScheduler(328))
- Initialized CapacityScheduler with calculator=class org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator,
minimumAllocation=<<memory:256, vCores:1>>, maximumAllocation=<<memory:101376,
vCores:64>>, asynchronousScheduling=false, asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector (ActiveStandbyElector.java:becomeActive(863))
- Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>         at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
>         at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
>         at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active
mode
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
>         at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
>         ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException:
Invalid application timeout, value=0 for type=LIFETIME
>         at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
>         ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout,
value=0 for type=LIFETIME
>         at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         ... 13 more
> {code}
> {color}
> Exception 2:
> {color:red}
> {code}
> 2016-12-09 14:57:26,162 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - application_1477927786494_0008
State change from NEW to FINISHED
> 2016-12-09 14:57:26,162 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(599))
- Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, value=0
for type=LIFETIME
>         at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
>         at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
>         at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
>         at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> 2016-12-09 14:57:26,162 INFO  service.AbstractService (AbstractService.java:noteFailure(272))
- Service RMActiveServices failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnException:
Invalid application timeout, value=0 for type=LIFETIME
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, value=0
for type=LIFETIME
>         at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
>         at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
>         at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
>         at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> {code}
> {color}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message