hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo Nicholas Sze (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HDFS-2420) improve handling of datanode timeouts
Date Thu, 03 Apr 2014 02:32:16 GMT

     [ https://issues.apache.org/jira/browse/HDFS-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Tsz Wo Nicholas Sze resolved HDFS-2420.

    Resolution: Not a Problem

I guess that this is not a problem anymore. Please feel free to reopen this if I am wrong.
Resolving ...

> improve handling of datanode timeouts
> -------------------------------------
>                 Key: HDFS-2420
>                 URL: https://issues.apache.org/jira/browse/HDFS-2420
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ron Bodkin
> If a datanode ever times out on a heart beat, it gets marked dead permanently. I am finding
that on AWS this is a periodic occurrence, i.e., datanodes time out although the datanode
process is still alive. The current solution to this is to kill and restart each such process
> It would be good if there were more retry logic (e.g., blacklisting the nodes but try
heartbeats for a longer period before determining they are apparently dead). It would also
be good if refreshNodes would check and attempt to recover timed out data nodes.

This message was sent by Atlassian JIRA

View raw message