hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-7604) Track and display failed DataNode storage locations in NameNode.
Date Fri, 16 Jan 2015 19:33:35 GMT

     [ https://issues.apache.org/jira/browse/HDFS-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth updated HDFS-7604:
--------------------------------
    Attachment: HDFS-7604.001.patch

The attached patch implements the feature.  Summary:
* The protocol definition for heartbeat requests has been changed to add {{failedStorageLocations}},
which contains multiple strings used to report the local file system path of each failed storage.
* The DN calculates its failed storage locations as the set difference between everything
configured in {{dfs.datanode.data.dir}} and the current live volumes in use by the {{FsDatasetImpl}}.
 Doing it this way works well with the live DN reconfiguration feature (HDFS-6808), because
it will use the current active configuration rather than what was loaded at process start
time.
* The failed storage locations are exposed through {{FSDatasetMBean}}, so the metrics on an
individual DN will publish that information.  I also updated {{FsDatasetImpl#getNumFailedVolumes}}
to keep its implementation in sync with the new method.
* {{FsVolumeList}} no longer needs to track a separate counter of failed volumes.  As a side
effect, I believe this is fixing a potential bug with live DN reconfiguration.  (If a previously
failed volume was brought back online through live reconfiguration, then I don't believe this
counter would have been decremented or reset to reflect the new state.)
* On the NN side, the heartbeat handling now updates its data structures to keep track of
the failed storage locations per DN.
* The failed storage locations for all DNs are exposed through {{FSNamesystemMBean}}.  There
is also a new counter for the total volume failures across all DNs.
* The web UI templates have been updated to display the new data.
* {{TestDataNodeVolumeFailureReporting}} contains the testing related to this feature.  I
took the opportunity to do a few other minor cleanups in this file.
* Numerous other test files contain minor changes to deal with method signature changes related
to passing the new field in the heartbeat.

> Track and display failed DataNode storage locations in NameNode.
> ----------------------------------------------------------------
>
>                 Key: HDFS-7604
>                 URL: https://issues.apache.org/jira/browse/HDFS-7604
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HDFS-7604-screenshot-1.png, HDFS-7604-screenshot-2.png, HDFS-7604-screenshot-3.png,
HDFS-7604.001.patch, HDFS-7604.prototype.patch
>
>
> During heartbeats, the DataNode can report a list of its storage locations that have
been taken out of service due to failure (such as due to a bad disk or a permissions problem).
 The NameNode can track these failed storage locations and then report them in JMX and the
NameNode web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message