hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zack Marsh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3871) ResourceManager down after Blueprint install
Date Tue, 30 Jun 2015 21:24:05 GMT

     [ https://issues.apache.org/jira/browse/YARN-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zack Marsh updated YARN-3871:
-----------------------------
    Description: 
On a 3-Master HDP 2.3 cluster installed with HDP-2.3.0.0-2482 and Ambari-2.1.0-1266, the YARN ResourceManager was down following the Blueprint install.

It's important to note that nothing failed during the Blueprint install. The ResourceManager shutdown because of an inability to connect to Zookeeper.

Excerpt from the ResourceManager log:
{code}
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.library.path=:/usr/hdp/2.3.0.0-2482/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2482/hadoop/lib/native:/usr/hdp/2.3.0.0-2482/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2482/hadoop/lib/native
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.name=Linux
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.version=3.0.101-0.50.TDC.1.R.0-default
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.name=yarn
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.home=/home/yarn
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.dir=/usr/hdp/2.3.0.0-2482/hadoop-yarn
2015-06-26 03:35:47,190 INFO  zookeeper.ZooKeeper (ZooKeeper.java:<init>(438)) - Initiating client connection, connectString=piripiri2.labs.teradata.com:2181,piripiri1.labs.teradata.com:2181,piripiri3.labs.teradata.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@59d2103b
2015-06-26 03:35:47,209 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:47,276 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:47,380 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:47,381 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
2015-06-26 03:35:47,452 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2015-06-26 03:35:48,067 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:48,378 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:49,914 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:49,915 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:50,028 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:50,028 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
2015-06-26 03:35:50,030 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2015-06-26 03:35:50,133 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:50,134 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:52,064 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:52,065 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:52,901 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:52,901 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
2015-06-26 03:35:52,902 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2015-06-26 03:35:53,570 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:53,571 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:55,541 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:55,542 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:56,513 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:56,514 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
2015-06-26 03:35:56,515 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2015-06-26 03:35:56,821 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:56,822 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:57,205 ERROR ha.ActiveStandbyElector (ActiveStandbyElector.java:waitForZKConnectionEvent(1044)) - Connection timed out: couldn't connect to ZooKeeper in 10000 milliseconds
2015-06-26 03:35:57,396 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x0 closed
2015-06-26 03:35:57,397 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread shut down
2015-06-26 03:35:57,403 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService failed in state INITED; cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
2015-06-26 03:35:57,404 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 7 more
2015-06-26 03:35:57,404 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service ResourceManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 7 more
2015-06-26 03:35:57,405 INFO  resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1068)) - Transitioning to standby state
2015-06-26 03:35:57,405 INFO  resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1075)) - Transitioned to standby state
2015-06-26 03:35:57,405 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1230)) - Error starting ResourceManager
org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 7 more
2015-06-26 03:35:57,407 INFO  resourcemanager.ResourceManager (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at piripiri3/39.0.40.3
************************************************************/
{code}

This issue was observed again on a 3-Master cluster installed with HDP-2.3.0.0-2497 and Ambari-2.1.0-1295.

YARN logs attached.

  was:
On a 3-Master HDP 2.3 cluster installed with HDP-2.3.0.0-2482 and Ambari-2.1.0-1266, the YARN ResourceManager was down following the Blueprint install.

It's important to note that nothing failed during the Blueprint install. The ResourceManager shutdown because of an inability to connect to Zookeeper.

Excerpt from the ResourceManager log:
{code}
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.library.path=:/usr/hdp/2.3.0.0-2482/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2482/hadoop/lib/native:/usr/hdp/2.3.0.0-2482/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2482/hadoop/lib/native
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.name=Linux
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.version=3.0.101-0.50.TDC.1.R.0-default
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.name=yarn
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.home=/home/yarn
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.dir=/usr/hdp/2.3.0.0-2482/hadoop-yarn
2015-06-26 03:35:47,190 INFO  zookeeper.ZooKeeper (ZooKeeper.java:<init>(438)) - Initiating client connection, connectString=piripiri2.labs.teradata.com:2181,piripiri1.labs.teradata.com:2181,piripiri3.labs.teradata.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@59d2103b
2015-06-26 03:35:47,209 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:47,276 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:47,380 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:47,381 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
2015-06-26 03:35:47,452 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2015-06-26 03:35:48,067 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:48,378 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:49,914 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:49,915 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:50,028 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:50,028 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
2015-06-26 03:35:50,030 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2015-06-26 03:35:50,133 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:50,134 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:52,064 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:52,065 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:52,901 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:52,901 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
2015-06-26 03:35:52,902 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2015-06-26 03:35:53,570 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:53,571 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:55,541 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:55,542 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:56,513 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:56,514 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
2015-06-26 03:35:56,515 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2015-06-26 03:35:56,821 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:56,822 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:57,205 ERROR ha.ActiveStandbyElector (ActiveStandbyElector.java:waitForZKConnectionEvent(1044)) - Connection timed out: couldn't connect to ZooKeeper in 10000 milliseconds
2015-06-26 03:35:57,396 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x0 closed
2015-06-26 03:35:57,397 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread shut down
2015-06-26 03:35:57,403 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService failed in state INITED; cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
2015-06-26 03:35:57,404 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 7 more
2015-06-26 03:35:57,404 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service ResourceManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 7 more
2015-06-26 03:35:57,405 INFO  resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1068)) - Transitioning to standby state
2015-06-26 03:35:57,405 INFO  resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1075)) - Transitioned to standby state
2015-06-26 03:35:57,405 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1230)) - Error starting ResourceManager
org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 7 more
2015-06-26 03:35:57,407 INFO  resourcemanager.ResourceManager (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at piripiri3/39.0.40.3
************************************************************/
{code}

This issue was observed again on a 3-Master cluster installed with HDP-2.3.0.0-2497 and Ambari-2.1.0-1295.




> ResourceManager down after Blueprint install 
> ---------------------------------------------
>
>                 Key: YARN-3871
>                 URL: https://issues.apache.org/jira/browse/YARN-3871
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>         Environment: ambari-2.1.0-1295, hdp-2.3.0.0-2497, sles11sp3
>            Reporter: Zack Marsh
>            Priority: Critical
>         Attachments: yarn-yarn-resourcemanager-piripiri3.log, yarn-yarn-resourcemanager-piripiri3.out
>
>
> On a 3-Master HDP 2.3 cluster installed with HDP-2.3.0.0-2482 and Ambari-2.1.0-1266, the YARN ResourceManager was down following the Blueprint install.
> It's important to note that nothing failed during the Blueprint install. The ResourceManager shutdown because of an inability to connect to Zookeeper.
> Excerpt from the ResourceManager log:
> {code}
> 2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.library.path=:/usr/hdp/2.3.0.0-2482/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2482/hadoop/lib/native:/usr/hdp/2.3.0.0-2482/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2482/hadoop/lib/native
> 2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
> 2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
> 2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.name=Linux
> 2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
> 2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.version=3.0.101-0.50.TDC.1.R.0-default
> 2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.name=yarn
> 2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.home=/home/yarn
> 2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.dir=/usr/hdp/2.3.0.0-2482/hadoop-yarn
> 2015-06-26 03:35:47,190 INFO  zookeeper.ZooKeeper (ZooKeeper.java:<init>(438)) - Initiating client connection, connectString=piripiri2.labs.teradata.com:2181,piripiri1.labs.teradata.com:2181,piripiri3.labs.teradata.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@59d2103b
> 2015-06-26 03:35:47,209 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-26 03:35:47,276 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> 2015-06-26 03:35:47,380 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-26 03:35:47,381 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
> 2015-06-26 03:35:47,452 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
> 2015-06-26 03:35:48,067 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-26 03:35:48,378 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> 2015-06-26 03:35:49,914 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-26 03:35:49,915 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> 2015-06-26 03:35:50,028 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-26 03:35:50,028 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
> 2015-06-26 03:35:50,030 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
> 2015-06-26 03:35:50,133 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-26 03:35:50,134 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> 2015-06-26 03:35:52,064 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-26 03:35:52,065 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> 2015-06-26 03:35:52,901 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-26 03:35:52,901 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
> 2015-06-26 03:35:52,902 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
> 2015-06-26 03:35:53,570 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-26 03:35:53,571 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> 2015-06-26 03:35:55,541 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-26 03:35:55,542 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> 2015-06-26 03:35:56,513 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-26 03:35:56,514 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
> 2015-06-26 03:35:56,515 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
> 2015-06-26 03:35:56,821 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-26 03:35:56,822 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> 2015-06-26 03:35:57,205 ERROR ha.ActiveStandbyElector (ActiveStandbyElector.java:waitForZKConnectionEvent(1044)) - Connection timed out: couldn't connect to ZooKeeper in 10000 milliseconds
> 2015-06-26 03:35:57,396 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x0 closed
> 2015-06-26 03:35:57,397 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread shut down
> 2015-06-26 03:35:57,403 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService failed in state INITED; cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
>         at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
>         at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
>         at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
>         at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
>         at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
> 2015-06-26 03:35:57,404 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
> org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
>         at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
>         at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
>         at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
>         at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
>         at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         ... 7 more
> 2015-06-26 03:35:57,404 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service ResourceManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
> org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
>         at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
>         at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
>         at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
>         at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
>         at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         ... 7 more
> 2015-06-26 03:35:57,405 INFO  resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1068)) - Transitioning to standby state
> 2015-06-26 03:35:57,405 INFO  resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1075)) - Transitioned to standby state
> 2015-06-26 03:35:57,405 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1230)) - Error starting ResourceManager
> org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
>         at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
>         at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
>         at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
>         at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
>         at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         ... 7 more
> 2015-06-26 03:35:57,407 INFO  resourcemanager.ResourceManager (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down ResourceManager at piripiri3/39.0.40.3
> ************************************************************/
> {code}
> This issue was observed again on a 3-Master cluster installed with HDP-2.3.0.0-2497 and Ambari-2.1.0-1295.
> YARN logs attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message