hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Liochon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-4754) Add an API in the namenode to mark a datanode as stale
Date Thu, 25 Apr 2013 13:14:17 GMT
Nicolas Liochon created HDFS-4754:

             Summary: Add an API in the namenode to mark a datanode as stale
                 Key: HDFS-4754
                 URL: https://issues.apache.org/jira/browse/HDFS-4754
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: hdfs-client, namenode
            Reporter: Nicolas Liochon
            Priority: Critical

There is a detection of the stale datanodes in HDFS since HDFS-3703, with a timeout, defaulted
to 30s.

There are two reasons to add an API to mark a node as stale even if the timeout is not yet
 1) ZooKeeper can detect that a client is dead at any moment. So, for HBase, we sometimes
start the recovery before a node is marked staled. (even with reasonable settings as: stale:
20s; HBase ZK timeout: 30s
 2) Some third parties could detect that a node is dead before the timeout, hence saving us
the cost of retrying. An example or such hw is Arista, presented here by [~tsuna] http://tsunanet.net/~tsuna/fsf-hbase-meetup-april13.pdf,
and confirmed in HBASE-6290.

As usual, even if the node is dead it can comeback before the 10 minutes limit. So I would
propose to set a timebound. The API would be

namenode.markStale(String ipAddress, int port, long durationInMs);

After durationInMs, the namenode would again rely only on its heartbeat to decide.


If there is no objections, and if nobody in the hdfs dev team has the time to spend some time
on it, I will give it a try for branch 2 & 3.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message