hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7833) DataNode reconfiguration does not recalculate valid volumes required, based on configured failed volumes tolerated.
Date Thu, 19 Mar 2015 23:22:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370321#comment-14370321
] 

Chris Nauroth commented on HDFS-7833:
-------------------------------------

Hi [~eddyxu].  Thank you for the patch.  Overall, this looks correct to me.  When I I filed
this issue, it was before some of the discussion we had in HDFS-7722, and I had 2 cases in
mind that could trigger this bug:

* Admin reconfigures DataNode to remove a path that has a failed volume.  As per discussion
in HDFS-7722, we've made the decision that this case should not clear volume failure information.
 Since this is logically still considered a volume failure, there is no harm done to the check
for sufficient resources.  IOW, after the discussion in HDFS-7722, we don't have to worry
about this case anymore.
* Admin reconfigures DataNode and adds a few new paths that weren't there before.  This case
is still a problem.

To properly cover the second case, let's add a test that does something like this:
# Start a DataNode with 2 volumes and {{dfs.datanode.failed.volumes.tolerated}} set to 1.
# Run DataNode reconfiguration to add a new volume.  Now we're up to 3 volumes total.
# Fail a volume.  Assert that the DataNode continues running.
# Fail another volume.  Assert that the DataNode stops running.

Without your patch, I expect this test would fail on the last step, because {{validVolsRequired}}
would have been calculated as 1, and we still have 1 volume remaining.  After applying your
patch, I expect the test would then pass.

> DataNode reconfiguration does not recalculate valid volumes required, based on configured
failed volumes tolerated.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7833
>                 URL: https://issues.apache.org/jira/browse/HDFS-7833
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.6.0
>            Reporter: Chris Nauroth
>            Assignee: Lei (Eddy) Xu
>         Attachments: HDFS-7833.000.patch
>
>
> DataNode reconfiguration never recalculates {{FsDatasetImpl#validVolsRequired}}.  This
may cause incorrect behavior of the {{dfs.datanode.failed.volumes.tolerated}} property if
reconfiguration causes the DataNode to run with a different total number of volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message