hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Tustin <mtus...@handybook.com>
Subject Re: Resource manager is down
Date Wed, 04 May 2016 20:08:09 GMT
The biggest win I've seen for stability of hadoop components is to give
them their own hard disks; or alternatively their own hosts.

Obviously, you'll also want to check the usual suspects or resource and
processor contention.

On Wed, May 4, 2016 at 3:59 PM, Anandha L Ranganathan <analog.sony@gmail.com
> wrote:

> The RM is keep going down and here is the error message we are getting.
> How do we  fix the issue ?
>
>
> ZK and RM are on the same host .
>
>
>
>
> 2016-05-04 19:17:36,132 INFO  resourcemanager.RMAppManager
> (RMAppManager.java:checkAppNumCompletedLimit(247)) - Max number of
> completed apps kept in state store met: maxCompletedAppsInStateStore =
> 10000, removing app application_1452798563961_0972 from state store.
>
> 2016-05-04 19:17:42,751 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:run(1096)) - Client session timed out, have not heard from
> server in 6668ms for sessionid 0x5547d33e8480000, closing socket connection
> and attempting reconnect
>
> 2016-05-04 19:17:42,851 INFO  recovery.ZKRMStateStore
> (ZKRMStateStore.java:runWithRetries(1110)) - Exception while executing a ZK
> operation.
>
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss
>
>             at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>
>             at
> org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
>
>             at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:937)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:934)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1076)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1097)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:934)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:948)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:965)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationStateInternal(ZKRMStateStore.java:655)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:148)
>
>             at
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>
>             at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>             at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>             at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:810)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:864)
>
>             at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:859)
>
>             at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>
>             at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>
>             at java.lang.Thread.run(Thread.java:745)
>
> 2016-05-04 19:17:42,851 INFO  recovery.ZKRMStateStore
> (ZKRMStateStore.java:runWithRetries(1112)) - Retrying operation on ZK.
> Retry no. 1
>
> 2016-05-04 19:17:42,964 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to
> server ip-10-0-83-40.us-west-2.compute.internal/10.0.83.40:2181. Will not
> attempt to authenticate using SASL (unknown error)
>
> 2016-05-04 19:17:42,965 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:primeConnection(852)) - Socket connection established to
> ip-10-0-83-40.us-west-2.compute.internal/10.0.83.40:2181, initiating
> session
>
> 2016-05-04 19:17:42,969 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:onConnected(1235)) - Session establishment complete on
> server ip-10-0-83-40.us-west-2.compute.internal/10.0.83.40:2181,
> sessionid = 0x5547d33e8480000, negotiated timeout = 10000
>
> 2016-05-04 19:17:42,991 WARN  zookeeper.ClientCnxn
> (ClientCnxn.java:run(1102)) - Session 0x5547d33e8480000 for server
> ip-10-0-83-40.us-west-2.compute.internal/10.0.83.40:2181, unexpected
> error, closing socket connection and attempting reconnect
>
> java.io.IOException: Broken pipe
>
>             at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>
>             at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>
>             at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>
>             at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>
>             at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>
>             at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
>
>             at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led 
by Fidelity


Mime
View raw message