hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1848) Datanodes should shutdown when a critical volume fails
Date Thu, 28 Apr 2011 15:45:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026355#comment-13026355
] 

Steve Loughran commented on HDFS-1848:
--------------------------------------

+1 for more healthchecking, with easy ways to specify what you want to check (presumably a
script to exec is the option of choice, or some java class to call)

Some standard checks for hdds (you see them in ant -diagnostics)
 -can you write to a dir
 -can you get back what you wrote
 -is the timestamp of the file roughly in sync with your clock (on network drives it may not
be)

If you are aggressive you could try to create a large file and see what happens, though if
the health check hangs, something else will need to detect that and report it as a failure.

Log drives cause problems when they aren't there or are full too.

> Datanodes should shutdown when a critical volume fails
> ------------------------------------------------------
>
>                 Key: HDFS-1848
>                 URL: https://issues.apache.org/jira/browse/HDFS-1848
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Eli Collins
>             Fix For: 0.23.0
>
>
> A DN should shutdown when a critical volume (eg the volume that hosts the OS, logs, pid,
tmp dir etc.) fails. The admin should be able to specify which volumes are critical, eg they
might specify the volume that lives on the boot disk. A failure in one of these volumes would
not be subject to the threshold (HDFS-1161) or result in host decommissioning (HDFS-1847)
as the decommissioning process would likely fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message