hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ron Bodkin (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-2420) improve handling of datanode timeouts
Date Sat, 08 Oct 2011 13:50:29 GMT
improve handling of datanode timeouts
-------------------------------------

                 Key: HDFS-2420
                 URL: https://issues.apache.org/jira/browse/HDFS-2420
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Ron Bodkin


If a datanode ever times out on a heart beat, it gets marked dead permanently. I am finding
that on AWS this is a periodic occurrence, i.e., datanodes time out although the datanode
process is still alive. The current solution to this is to kill and restart each such process
independently. 

It would be good if there were more retry logic (e.g., blacklisting the nodes but try heartbeats
for a longer period before determining they are apparently dead). It would also be good if
refreshNodes would check and attempt to recover timed out data nodes.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message