hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt
Date Wed, 22 Oct 2014 16:26:35 GMT

    [ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180130#comment-14180130
] 

Jason Lowe commented on YARN-2010:
----------------------------------

We recently ran into a case where an application tried to recover with an expired token and
the InvalidToken exception thrown by the delegation token secret manager for this application
prevented the RM from coming up.

> RM can't transition to active if it can't recover an app attempt
> ----------------------------------------------------------------
>
>                 Key: YARN-2010
>                 URL: https://issues.apache.org/jira/browse/YARN-2010
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: bc Wong
>            Assignee: Karthik Kambatla
>            Priority: Critical
>         Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, yarn-2010-3.patch,
yarn-2010-3.patch
>
>
> If the RM fails to recover an app attempt, it won't come up. We should make it more resilient.
> Specifically, the underlying error is that the app was submitted before Kerberos security
got turned on. Makes sense for the app to fail in this case. But YARN should still start.
> {noformat}
> 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling
the winning of election 
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active 
> at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)

> at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)

> at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)

> at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active
mode 
> at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)

> at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)

> ... 4 more 
> Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException:
java.lang.IllegalArgumentException: Missing argument 
> at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)

> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)

> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)

> at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)

> ... 5 more 
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: java.lang.IllegalArgumentException:
Missing argument 
> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)

> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)

> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)

> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)

> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)

> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> ... 8 more 
> Caused by: java.lang.IllegalArgumentException: Missing argument 
> at javax.crypto.spec.SecretKeySpec.<init>(SecretKeySpec.java:93) 
> at org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)

> at org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)

> at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)

> at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)

> at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)

> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)

> ... 13 more 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message