hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zack Marsh (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-3871) ResourceManager down after Blueprint install
Date Tue, 30 Jun 2015 21:24:04 GMT
Zack Marsh created YARN-3871:
--------------------------------

             Summary: ResourceManager down after Blueprint install 
                 Key: YARN-3871
                 URL: https://issues.apache.org/jira/browse/YARN-3871
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 2.7.1
         Environment: ambari-2.1.0-1295, hdp-2.3.0.0-2497, sles11sp3

            Reporter: Zack Marsh
            Priority: Critical
         Attachments: yarn-yarn-resourcemanager-piripiri3.log, yarn-yarn-resourcemanager-piripiri3.out

On a 3-Master HDP 2.3 cluster installed with HDP-2.3.0.0-2482 and Ambari-2.1.0-1266, the YARN
ResourceManager was down following the Blueprint install.

It's important to note that nothing failed during the Blueprint install. The ResourceManager
shutdown because of an inability to connect to Zookeeper.

Excerpt from the ResourceManager log:
{code}
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client
environment:java.library.path=:/usr/hdp/2.3.0.0-2482/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2482/hadoop/lib/native:/usr/hdp/2.3.0.0-2482/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2482/hadoop/lib/native
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client
environment:java.io.tmpdir=/tmp
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client
environment:java.compiler=<NA>
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client
environment:os.name=Linux
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client
environment:os.arch=amd64
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client
environment:os.version=3.0.101-0.50.TDC.1.R.0-default
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client
environment:user.name=yarn
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client
environment:user.home=/home/yarn
2015-06-26 03:35:47,188 INFO  zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client
environment:user.dir=/usr/hdp/2.3.0.0-2482/hadoop-yarn
2015-06-26 03:35:47,190 INFO  zookeeper.ZooKeeper (ZooKeeper.java:<init>(438)) - Initiating
client connection, connectString=piripiri2.labs.teradata.com:2181,piripiri1.labs.teradata.com:2181,piripiri3.labs.teradata.com:2181
sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@59d2103b
2015-06-26 03:35:47,209 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975))
- Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not
attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:47,276 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0
for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:47,380 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975))
- Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not
attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:47,381 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852))
- Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating
session
2015-06-26 03:35:47,452 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to
read additional data from server sessionid 0x0, likely server has closed socket, closing socket
connection and attempting reconnect
2015-06-26 03:35:48,067 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975))
- Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not
attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:48,378 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0
for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:49,914 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975))
- Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not
attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:49,915 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0
for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:50,028 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975))
- Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not
attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:50,028 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852))
- Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating
session
2015-06-26 03:35:50,030 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to
read additional data from server sessionid 0x0, likely server has closed socket, closing socket
connection and attempting reconnect
2015-06-26 03:35:50,133 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975))
- Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not
attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:50,134 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0
for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:52,064 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975))
- Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not
attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:52,065 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0
for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:52,901 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975))
- Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not
attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:52,901 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852))
- Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating
session
2015-06-26 03:35:52,902 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to
read additional data from server sessionid 0x0, likely server has closed socket, closing socket
connection and attempting reconnect
2015-06-26 03:35:53,570 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975))
- Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not
attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:53,571 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0
for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:55,541 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975))
- Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not
attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:55,542 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0
for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:56,513 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975))
- Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not
attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:56,514 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852))
- Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating
session
2015-06-26 03:35:56,515 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to
read additional data from server sessionid 0x0, likely server has closed socket, closing socket
connection and attempting reconnect
2015-06-26 03:35:56,821 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975))
- Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not
attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:56,822 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0
for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:57,205 ERROR ha.ActiveStandbyElector (ActiveStandbyElector.java:waitForZKConnectionEvent(1044))
- Connection timed out: couldn't connect to ZooKeeper in 10000 milliseconds
2015-06-26 03:35:57,396 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x0
closed
2015-06-26 03:35:57,397 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread
shut down
2015-06-26 03:35:57,403 INFO  service.AbstractService (AbstractService.java:noteFailure(272))
- Service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService failed in state
INITED; cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
= ConnectionLoss
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
2015-06-26 03:35:57,404 INFO  service.AbstractService (AbstractService.java:noteFailure(272))
- Service org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state INITED;
cause: org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 7 more
2015-06-26 03:35:57,404 INFO  service.AbstractService (AbstractService.java:noteFailure(272))
- Service ResourceManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 7 more
2015-06-26 03:35:57,405 INFO  resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1068))
- Transitioning to standby state
2015-06-26 03:35:57,405 INFO  resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1075))
- Transitioned to standby state
2015-06-26 03:35:57,405 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1230))
- Error starting ResourceManager
org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
        at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
        at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
        at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
        at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 7 more
2015-06-26 03:35:57,407 INFO  resourcemanager.ResourceManager (LogAdapter.java:info(45)) -
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at piripiri3/39.0.40.3
************************************************************/
{code}

This issue was observed again on a 3-Master cluster installed with HDP-2.3.0.0-2497 and Ambari-2.1.0-1295.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message