hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
Date Mon, 16 May 2011 05:21:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033875#comment-13033875
] 

Eli Collins edited comment on HDFS-1592 at 5/16/11 5:21 AM:
------------------------------------------------------------

The intent of this jira (as I understand it, see HDFS-1849) is that the DN should start even
if there are failed volumes, as long as the number of failed volumes is <= {{dfs.datanode.failed.volumes.tolerated}}.
The use case is that an admin configures n volume failures to tolerate, then when the cluster
is restarted all the nodes with less than n failed volumes should startup, ie restarting the
DN should respect the {{dfs.datanode.failed.volumes.tolerated}} value so you don't end up
with a cluster with DNs that were running successfully but fail to restart.

With the current patch the DN will refuse to come up if *any* of the volumes have failed,
no matter how dfs.datanode.failed.volumes.tolerated is configured. We need tests that verifies:
* A DN will successfully start  with a failed volume as long as it's configured to tolerate
a failed volume
* A DN will fail to start if more than the number of tolerated volumes are failed

Make sense?

      was (Author: eli):
    The intent of this jira (as I understand it, see HDFS-1849) is that the Datanode should
start even if there are failed volumes, as long as the number of failed volumes is <= dfs.datanode.failed.volumes.tolerated.
The use case is that an admin configures n volumes failures to tolerate, then whey they restart
the cluster all the nodes with less than n failed volumes should startup, ie restarting the
cluster should not result in the datanodes that were running fine to fail to startup because
the number of volume failures tolerated is not being checked.

With the current patch the DN will refuse to come up if *any* of the volumes have failed,
no matter how dfs.datanode.failed.volumes.tolerated is configured. We need a tests that verifies:
* A DN will successfully start  with a failed volume as long as it's configured to tolerate
a failed volume
* A DN will fail to start if more than the number of tolerated volumes are failed

Make sense?
  
> Datanode startup doesn't honor volumes.tolerated 
> -------------------------------------------------
>
>                 Key: HDFS-1592
>                 URL: https://issues.apache.org/jira/browse/HDFS-1592
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.204.0
>            Reporter: Bharath Mundlapudi
>            Assignee: Bharath Mundlapudi
>             Fix For: 0.20.204.0, 0.23.0
>
>         Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-rel20.patch
>
>
> Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message