Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B786119E41 for ; Fri, 4 Mar 2016 23:51:41 +0000 (UTC) Received: (qmail 98268 invoked by uid 500); 4 Mar 2016 23:51:41 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 98147 invoked by uid 500); 4 Mar 2016 23:51:41 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 98098 invoked by uid 99); 4 Mar 2016 23:51:41 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Mar 2016 23:51:41 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 224842C1F71 for ; Fri, 4 Mar 2016 23:51:41 +0000 (UTC) Date: Fri, 4 Mar 2016 23:51:41 +0000 (UTC) From: "Chris Nauroth (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-9239) DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-9239: -------------------------------- Release Note: This release adds a new feature called the DataNode Lifeline Protocol. If configured, then DataNodes can report that they are still alive to the NameNode via a fallback protocol, separate from the existing heartbeat messages. This can prevent the NameNode from incorrectly marking DataNodes as stale or dead in highly overloaded clusters where heartbeat processing is suffering delays. For more information, please refer to the hdfs-default.xml documentation for several new configuration properties: dfs.namenode.lifeline.rpc-address, dfs.namenode.lifeline.rpc-bind-host, dfs.datanode.lifeline.interval.seconds, dfs.namenode.lifeline.handler.ratio and dfs.namenode.lifeline.handler.count. > DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness > ----------------------------------------------------------------------------------- > > Key: HDFS-9239 > URL: https://issues.apache.org/jira/browse/HDFS-9239 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode > Reporter: Chris Nauroth > Assignee: Chris Nauroth > Fix For: 2.8.0 > > Attachments: DataNode-Lifeline-Protocol.pdf, HDFS-9239.001.patch, HDFS-9239.002.patch, HDFS-9239.003.patch > > > This issue proposes introduction of a new feature: the DataNode Lifeline Protocol. This is an RPC protocol that is responsible for reporting liveness and basic health information about a DataNode to a NameNode. Compared to the existing heartbeat messages, it is lightweight and not prone to resource contention problems that can harm accurate tracking of DataNode liveness currently. The attached design document contains more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)