ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-21204) Yarn stopped by itself after start. HA run
Date Mon, 12 Jun 2017 16:06:02 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-21204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16046726#comment-16046726
] 

Hudson commented on AMBARI-21204:
---------------------------------

SUCCESS: Integrated in Jenkins build Ambari-branch-2.5 #1590 (See [https://builds.apache.org/job/Ambari-branch-2.5/1590/])
AMBARI-21204 Yarn stopped by itself after start. HA run (dsen) (dsen: [http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=262e84f0d1cd8f26ecd5e19c47c5b37fb0a9fcf9])
* (edit) ambari-agent/src/test/java/org/apache/ambari/tools/zk/ZkMigratorTest.java
* (edit) ambari-agent/src/main/java/org/apache/ambari/tools/zk/ZkMigrator.java
* (edit) ambari-common/src/main/python/resource_management/core/resources/zkmigrator.py
* (edit) ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py


> Yarn stopped by itself after start. HA run
> ------------------------------------------
>
>                 Key: AMBARI-21204
>                 URL: https://issues.apache.org/jira/browse/AMBARI-21204
>             Project: Ambari
>          Issue Type: Bug
>    Affects Versions: 2.5.1
>            Reporter: Dmytro Sen
>            Assignee: Dmytro Sen
>            Priority: Critical
>             Fix For: 2.5.2
>
>         Attachments: AMBARI-21204_3.patch
>
>
> From RM logs :
> {code}
> 2017-06-07 14:23:19,191 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1240))
- Error starting ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: Couldn't set ACLs
on parent ZNode: /yarn-leader-election
>         at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:152)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:281)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1236)
> Caused by: java.io.IOException: Couldn't set ACLs on parent ZNode: /yarn-leader-election
>         at org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:351)
>         at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:103)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         ... 7 more
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode
= BadVersion for /yarn-leader-election
> {code}
> The problem is that disabling security changes zk ACL for resource manager as part of
AMBARI-19331. After the recent change in HDFS-11403, RM checks znode version and fails if
it's different than expected.
> The correct fix could be to remove znode during security disabling and do not break election
znode consistency by manually changing ACL to all. RM should create it with proper ACL.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message