hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1848) Datanodes should shutdown when a critical volume fails
Date Tue, 26 Apr 2011 18:07:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025349#comment-13025349

Eli Collins commented on HDFS-1848:

I agree the datanode should only check the validity of all the directories where it is configured
to store data. 

Point #1 is limited about allowing an administrator to specify that not all of these configured
directories should necessarily be treated equal wrt the policy for tolerating failures. Ie
the idea is *not* to use dfs.data.dir for general datanode health monitoring. There are already
plenty of tools that monitor disk health, HDFS should just do the right thing when it experiences
a failure.

Point #2 is that - in general - if the datanode experiences some failures (eg those caused
by a failed root disk) it should fail-stop.

Another way to put this is that the datanode should be *proactive* about check for failures
in it's data volumes and *re-active* about other disk failures (eg of the root disk).

> Datanodes should shutdown when a critical volume fails
> ------------------------------------------------------
>                 Key: HDFS-1848
>                 URL: https://issues.apache.org/jira/browse/HDFS-1848
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Eli Collins
>             Fix For: 0.23.0
> A DN should shutdown when a critical volume (eg the volume that hosts the OS, logs, pid,
tmp dir etc.) fails. The admin should be able to specify which volumes are critical, eg they
might specify the volume that lives on the boot disk. A failure in one of these volumes would
not be subject to the threshold (HDFS-1161) or result in host decommissioning (HDFS-1847)
as the decommissioning process would likely fail.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message