hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
Date Fri, 21 Mar 2014 00:03:48 GMT

    [ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942554#comment-13942554
] 

Arpit Gupta commented on YARN-1861:
-----------------------------------

Here is a snippet from the log

{code}
2014-03-18 09:39:42,544 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(966))
- Opening socket connection to server h2-ha-suse-uns-1395117052-2.cs1cloud.internal/172.18.145.62:2181.
Will not att
empt to authenticate using SASL (unknown error)
2014-03-18 09:39:42,545 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(849))
- Socket connection established to h2-ha-suse-uns-1395117052-2.cs1cloud.internal/172.18.145.62:2181,
initiating sess
ion
2014-03-18 09:39:45,437 INFO  zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1211)) - Session
establishment complete on server h2-ha-suse-uns-1395117052-2.cs1cloud.internal/172.18.145.62:2181,
sessionid
= 0x144d394247b0005, negotiated timeout = 10000
2014-03-18 09:39:47,326 INFO  recovery.ZKRMStateStore (ZKRMStateStore.java:processWatchEvent(737))
- Watcher event type: None with state:Disconnected for path:null for Service org.apache.hadoop.yarn.server.
resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
STARTED
2014-03-18 09:39:47,326 INFO  recovery.ZKRMStateStore (ZKRMStateStore.java:processWatchEvent(755))
- ZKRMStateStore Session disconnected
2014-03-18 09:39:47,326 INFO  recovery.ZKRMStateStore (ZKRMStateStore.java:processWatchEvent(737))
- Watcher event type: None with state:SyncConnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore
in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore (ZKRMStateStore.java:processWatchEvent(745))
- ZKRMStateStore Session connected
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore (ZKRMStateStore.java:processWatchEvent(751))
- ZKRMStateStore Session restored
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore (ZKRMStateStore.java:processWatchEvent(737))
- Watcher event type: None with state:Disconnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore
in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore (ZKRMStateStore.java:processWatchEvent(755))
- ZKRMStateStore Session disconnected
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore (ZKRMStateStore.java:processWatchEvent(737))
- Watcher event type: None with state:SyncConnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore
in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore (ZKRMStateStore.java:processWatchEvent(745))
- ZKRMStateStore Session connected
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore (ZKRMStateStore.java:processWatchEvent(751))
- ZKRMStateStore Session restored
2014-03-18 09:39:47,328 FATAL resourcemanager.ResourceManager (ResourceManager.java:handle(652))
- Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_FENCED.
Cause:
org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFencedException: RMStateStore
has been fenced
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread.run(ZKRMStateStore.java:880)

2014-03-18 09:39:47,328 INFO  resourcemanager.ResourceManager (ResourceManager.java:handle(656))
- RMStateStore has been fenced
2014-03-18 09:39:47,328 INFO  resourcemanager.ResourceManager (ResourceManager.java:handle(660))
- Transitioning RM to Standby mode
2014-03-18 09:39:47,328 INFO  resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(872))
- Transitioning to standby state
{code}

> Both RM stuck in standby mode when automatic failover is enabled
> ----------------------------------------------------------------
>
>                 Key: YARN-1861
>                 URL: https://issues.apache.org/jira/browse/YARN-1861
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>
> In our HA tests we noticed that the tests got stuck because both RM's got into standby
state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message