hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11340) DataNode reconfigure for disks doesn't remove the failed volumes
Date Fri, 10 Feb 2017 10:36:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861067#comment-15861067

Yiqun Lin commented on HDFS-11340:

Thanks for updating the patch, [~manojg]!
The latest patch almost looks good. Two minor comments:

* Suggest we can add a log when we find failed storage location is not existed in new storage
locations. This should be the thing we are mainly concerning here.
    // Use the failed storage locations from the current conf
    // to detect removals in the new conf.
    if (getFSDataset().getNumFailedVolumes() > 0) {
      for (String failedStorageLocation : getFSDataset()
          .getVolumeFailureSummary().getFailedStorageLocations()) {
        boolean found = false;
        for (Iterator<StorageLocation> newLocationItr =
             results.newLocations.iterator(); newLocationItr.hasNext();) {
          StorageLocation newLocation = newLocationItr.next();
          if (newLocation.getNormalizedUri().toString().equals(
              failedStorageLocation)) {
            // The failed storage is being re-added. DataNode#refreshVolumes()
            // will take care of re-assessing it.
            found = true;

        // New conf doesn't have this failed storage location.
        // Add to the deactivate locations list.
        if (!found) {
          LOG.info("...");  <== print the log
* Please fix checkstyle warnings,

> DataNode reconfigure for disks doesn't remove the failed volumes
> ----------------------------------------------------------------
>                 Key: HDFS-11340
>                 URL: https://issues.apache.org/jira/browse/HDFS-11340
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-11340.01.patch, HDFS-11340.02.patch, HDFS-11340.03.patch
> Say a DataNode (uuid:xyz) has disks D1 and D2. When D1 turns bad, JMX query on FSDatasetState-xyz
for "NumFailedVolumes" attr rightly shows the failed volume count as 1 and the "FailedStorageLocations"
attr has the failed storage location as "D1".
> It is possible to add or remove disks to this DataNode by running {{reconfigure}} command.
Let the failed disk D1 be removed from the conf and the new conf has only one good disk D2.
Upon running the reconfigure command for this DataNode with this new disk conf, the expectation
is DataNode would no more have "NumFailedVolumes" or "FailedStorageLocations". But, even after
removing the failed disk from the conf and a successful reconfigure, DataNode continues to
show the "NumFailedVolumes" as 1 and "FailedStorageLocations" as "D1" and it never gets reset.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message