hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lin Yiqun (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-12680) Loss of zookeeper quorum lead all the namenode to be standby state
Date Fri, 25 Dec 2015 08:09:49 GMT
Lin Yiqun created HADOOP-12680:
----------------------------------

             Summary: Loss of zookeeper quorum lead all the namenode to be standby state
                 Key: HADOOP-12680
                 URL: https://issues.apache.org/jira/browse/HADOOP-12680
             Project: Hadoop Common
          Issue Type: Bug
          Components: ha
    Affects Versions: 2.7.1
            Reporter: Lin Yiqun


When I am upgrading my zookeeper cluster, and will change the ip address of zk nodes. And
I found two namenodes of my hadoop cluster got loss of connection with zk. And when I revocer
the zk cluster, the two namenodes are both transitioned to standby state and this makes cluster
can't provide service. I found the reason may be is following:
{code}
/**
     * If the elector gets disconnected from Zookeeper and does not know about
     * the lock state, then it will notify the service via the enterNeutralMode
     * interface. The service may choose to ignore this or stop doing state
     * changing operations. Upon reconnection, the elector verifies the leader
     * status and calls back on the becomeActive and becomeStandby app
     * interfaces. <br/>
     * Zookeeper disconnects can happen due to network issues or loss of
     * Zookeeper quorum. Thus enterNeutralMode can be used to guard against
     * split-brain issues. In such situations it might be prudent to call
     * becomeStandby too. However, such state change operations might be
     * expensive and enterNeutralMode can help guard against doing that for
     * transient issues.
     */
    void enterNeutralMode();
{code}
May be we should create a thread to monitor the stat of namenodes and don't let them all to
be standby state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message