hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9239) DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness
Date Wed, 02 Mar 2016 23:56:18 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chris Nauroth updated HDFS-9239:
    Attachment: HDFS-9239.002.patch

I'd like to proceed with this feature, as it has been mentioned as potentially relevant in
comments on other JIRAs.  I'm attaching patch v002 with just a few small changes:
# Rebase on current trunk.
# Address comments from Anu.
# Fix a few Checkstyle warnings.  I think the remaining Checkstyle warnings flagged in the
last pre-commit run are not worth addressing, but I'll review the next pre-commit run for
new warnings.

There had been a suggestion of changing the existing heartbeat handling to use tryLock.  I
explored this a bit, but I'm reluctant to alter mainline heartbeat processing at all.  Overall,
I think this feature is less intrusive as currently implemented, despite the fact that another
RPC server adds some operational complexity.  Perhaps a tryLock-based implementation of heartbeat
handling could be done in a separate JIRA, again gated by a configuration flag, to enable
further experimentation in large clusters.

> DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness
> -----------------------------------------------------------------------------------
>                 Key: HDFS-9239
>                 URL: https://issues.apache.org/jira/browse/HDFS-9239
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: DataNode-Lifeline-Protocol.pdf, HDFS-9239.001.patch, HDFS-9239.002.patch
> This issue proposes introduction of a new feature: the DataNode Lifeline Protocol.  This
is an RPC protocol that is responsible for reporting liveness and basic health information
about a DataNode to a NameNode.  Compared to the existing heartbeat messages, it is lightweight
and not prone to resource contention problems that can harm accurate tracking of DataNode
liveness currently.  The attached design document contains more details.

This message was sent by Atlassian JIRA

View raw message