Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-dev@hadoop.apache.org
Date: Fri, 30 Jan 2015 21:05:35 +0000 (UTC)
From: "Chris Nauroth (JIRA)" <jira@apache.org>
To: hdfs-dev@hadoop.apache.org
Message-ID: <JIRA.12771422.1422651917000.218406.1422651935142@Atlassian.JIRA>
In-Reply-To: <JIRA.12771422.1422651917000@Atlassian.JIRA>
References: <JIRA.12771422.1422651917000@Atlassian.JIRA>
 <JIRA.12771422.1422651917520@arcas>
Subject: [jira] [Created] (HDFS-7714) Simultaneous restart of HA NameNodes
 and DataNode can cause DataNode to register successfully with only one
 NameNode.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Chris Nauroth created HDFS-7714:
-----------------------------------

             Summary: Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode.
                 Key: HDFS-7714
                 URL: https://issues.apache.org/jira/browse/HDFS-7714
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.6.0
            Reporter: Chris Nauroth


In an HA deployment, DataNodes must register with both NameNodes and send periodic heartbeats and block reports to both.  However, if NameNodes and DataNodes are restarted simultaneously, then this can trigger a race condition in registration.  The end result is that the {{BPServiceActor}} for one NameNode terminates, but the {{BPServiceActor}} for the other NameNode remains alive.  The DataNode process is then in a "half-alive" state where it only heartbeats and sends block reports to one of the NameNodes.  This could cause a loss of storage capacity after an HA failover.  The DataNode process would have to be restarted to resolve this.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)