hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "bc Wong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-2010) RM can't transition to active if it can't recover an app attempt
Date Wed, 30 Apr 2014 18:58:15 GMT
bc Wong created YARN-2010:
-----------------------------

             Summary: RM can't transition to active if it can't recover an app attempt
                 Key: YARN-2010
                 URL: https://issues.apache.org/jira/browse/YARN-2010
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 2.3.0
            Reporter: bc Wong


If the RM fails to recover an app attempt, it won't come up. We should make it more resilient.

Specifically, the underlying error is that the app was submitted before Kerberos security
got turned on. Makes sense for the app to fail in this case. But YARN should still start.

{noformat}
2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling
the winning of election 
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active 
at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)

at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804) 
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)

at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active
mode 
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)

at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)

... 4 more 
Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException:
java.lang.IllegalArgumentException: Missing argument 
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)

at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)

at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)

at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)

... 5 more 
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: java.lang.IllegalArgumentException:
Missing argument 
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)

at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)

at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)

at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)

at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)

at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
... 8 more 
Caused by: java.lang.IllegalArgumentException: Missing argument 
at javax.crypto.spec.SecretKeySpec.<init>(SecretKeySpec.java:93) 
at org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)

at org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)

at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)

at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)

at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)

at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)

... 13 more 
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message