hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt
Date Fri, 30 May 2014 07:53:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013398#comment-14013398
] 

Sandy Ryza commented on YARN-2010:
----------------------------------

Agree with Vinod that non-secure cluster to secure cluster is not currently supported and
bound to have tons of issues.  I've come across other "bugs" that have turned out to stem
from this.  If this is the only situation where we could conceivably face this issue, I'm
somewhat dubious about whether it needs to be fixed.  On the other hand, in general, being
defensive about allowing a transition to active even when an app recovery fails makes sense
to me.

> RM can't transition to active if it can't recover an app attempt
> ----------------------------------------------------------------
>
>                 Key: YARN-2010
>                 URL: https://issues.apache.org/jira/browse/YARN-2010
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: bc Wong
>            Assignee: Rohith
>            Priority: Critical
>         Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, yarn-2010-3.patch
>
>
> If the RM fails to recover an app attempt, it won't come up. We should make it more resilient.
> Specifically, the underlying error is that the app was submitted before Kerberos security
got turned on. Makes sense for the app to fail in this case. But YARN should still start.
> {noformat}
> 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling
the winning of election 
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active 
> at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)

> at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)

> at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)

> at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active
mode 
> at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)

> at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)

> ... 4 more 
> Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException:
java.lang.IllegalArgumentException: Missing argument 
> at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)

> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)

> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)

> at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)

> ... 5 more 
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: java.lang.IllegalArgumentException:
Missing argument 
> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)

> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)

> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)

> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)

> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)

> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> ... 8 more 
> Caused by: java.lang.IllegalArgumentException: Missing argument 
> at javax.crypto.spec.SecretKeySpec.<init>(SecretKeySpec.java:93) 
> at org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)

> at org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)

> at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)

> at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)

> at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)

> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)

> ... 13 more 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message