hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-7604) Track and display failed DataNode storage locations in NameNode.
Date Fri, 16 Jan 2015 19:33:35 GMT

     [ https://issues.apache.org/jira/browse/HDFS-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chris Nauroth updated HDFS-7604:
    Attachment: HDFS-7604.001.patch

The attached patch implements the feature.  Summary:
* The protocol definition for heartbeat requests has been changed to add {{failedStorageLocations}},
which contains multiple strings used to report the local file system path of each failed storage.
* The DN calculates its failed storage locations as the set difference between everything
configured in {{dfs.datanode.data.dir}} and the current live volumes in use by the {{FsDatasetImpl}}.
 Doing it this way works well with the live DN reconfiguration feature (HDFS-6808), because
it will use the current active configuration rather than what was loaded at process start
* The failed storage locations are exposed through {{FSDatasetMBean}}, so the metrics on an
individual DN will publish that information.  I also updated {{FsDatasetImpl#getNumFailedVolumes}}
to keep its implementation in sync with the new method.
* {{FsVolumeList}} no longer needs to track a separate counter of failed volumes.  As a side
effect, I believe this is fixing a potential bug with live DN reconfiguration.  (If a previously
failed volume was brought back online through live reconfiguration, then I don't believe this
counter would have been decremented or reset to reflect the new state.)
* On the NN side, the heartbeat handling now updates its data structures to keep track of
the failed storage locations per DN.
* The failed storage locations for all DNs are exposed through {{FSNamesystemMBean}}.  There
is also a new counter for the total volume failures across all DNs.
* The web UI templates have been updated to display the new data.
* {{TestDataNodeVolumeFailureReporting}} contains the testing related to this feature.  I
took the opportunity to do a few other minor cleanups in this file.
* Numerous other test files contain minor changes to deal with method signature changes related
to passing the new field in the heartbeat.

> Track and display failed DataNode storage locations in NameNode.
> ----------------------------------------------------------------
>                 Key: HDFS-7604
>                 URL: https://issues.apache.org/jira/browse/HDFS-7604
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HDFS-7604-screenshot-1.png, HDFS-7604-screenshot-2.png, HDFS-7604-screenshot-3.png,
HDFS-7604.001.patch, HDFS-7604.prototype.patch
> During heartbeats, the DataNode can report a list of its storage locations that have
been taken out of service due to failure (such as due to a bad disk or a permissions problem).
 The NameNode can track these failed storage locations and then report them in JMX and the
NameNode web UI.

This message was sent by Atlassian JIRA

View raw message