hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6072) RM unable to start in secure mode
Date Tue, 10 Jan 2017 19:20:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815911#comment-15815911
] 

Naganarasimha G R commented on YARN-6072:
-----------------------------------------

Thanks [~jianhe],
bq. If HA is not enabled, this call will be adding 'null' elector 
offline we had discussed on the same but inside addIfService there is {{instanceOf}} check
and passing null fails thus not adding it as service.

bq. I think we can either move the entire elector creation code after add admin service, or
move add admin service before adding elector.
Actually we were not sure what were the steps which needs to be done before login (and why
?) based on the comment {{"// Set HA configuration should be done before login"}} so to be
on the safer side we just pushed adding of the Elector service only below the adminService.
So if you can give more inputs on it we can correct it.

bq. I think, the ex.getMessage will just be duplicated in the log trace
Hmm yes but additionally we get the log trace too, though current issue is a code error NPE
trace was not coming hence we added.


 


> RM unable to start in secure mode
> ---------------------------------
>
>                 Key: YARN-6072
>                 URL: https://issues.apache.org/jira/browse/YARN-6072
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.8.0, 3.0.0-alpha2
>            Reporter: Bibin A Chundatt
>            Assignee: Ajith S
>            Priority: Blocker
>         Attachments: YARN-6072.01.branch-2.8.patch, YARN-6072.01.branch-2.patch, YARN-6072.01.patch,
YARN-6072.02.patch, hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found resource hadoop-policy.xml
at file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO org.apache.hadoop.yarn.server.resourcemanager.AdminService:
Refresh All
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
>         at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
>         at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
>         at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR org.apache.hadoop.yarn.server.resourcemanager.AdminService:
RefreshAll failed so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
>         at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
>         at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
>         at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1
for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling
the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>         at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>         at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
>         at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll during transition
to Active
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
>         at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
>         ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
>         ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and invokes  {{AdminService#refreshAll()}}
but {{AdminService#serviceStart()}} happens after {{ActiveStandbyElectorBasedElectorService}}
service start is complete. So {{AdminService#server}} will be *null* which causes  {{AdminService#refreshAll()}}
 to fail
> {code}
>       if (getConfig().getBoolean(
>           CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION,
>           false)) {
>         refreshServiceAcls();
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message