hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11340) DataNode reconfigure for disks doesn't remove the failed volumes
Date Sat, 04 Feb 2017 06:40:52 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852647#comment-15852647
] 

Yiqun Lin commented on HDFS-11340:
----------------------------------

Thanka [~manojg] for working on this. I haven't looked into this, but I caught this one:
{code}
+  private static void checkDiskErrorSync(DataNode dn, FsVolumeImpl volume)
+      throws InterruptedException, TimeoutException {
+    final long lastDiskErrorCheck = dn.getLastDiskErrorCheck();
+    dn.checkDiskErrorAsync(volume);
+    // Wait 10 seconds for checkDiskError thread to
+    // finish and discover volume failures.
+    GenericTestUtils.waitFor(new Supplier<Boolean>() {
+      @Override
+      public Boolean get() {
+        if(dn.getLastDiskErrorCheck() != lastDiskErrorCheck) {
+          return true;
+        }
+        return false;
+      }
+    }, 1000, 10000);
+  }
+
{code}
Since HDFS-11353 has been merged to trunk, we can use {{DataNodeTestUtils#waitForDiskError}}
to replace with this.

> DataNode reconfigure for disks doesn't remove the failed volumes
> ----------------------------------------------------------------
>
>                 Key: HDFS-11340
>                 URL: https://issues.apache.org/jira/browse/HDFS-11340
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-11340.01.patch, HDFS-11340.02.patch
>
>
> Say a DataNode (uuid:xyz) has disks D1 and D2. When D1 turns bad, JMX query on FSDatasetState-xyz
for "NumFailedVolumes" attr rightly shows the failed volume count as 1 and the "FailedStorageLocations"
attr has the failed storage location as "D1".
> It is possible to add or remove disks to this DataNode by running {{reconfigure}} command.
Let the failed disk D1 be removed from the conf and the new conf has only one good disk D2.
Upon running the reconfigure command for this DataNode with this new disk conf, the expectation
is DataNode would no more have "NumFailedVolumes" or "FailedStorageLocations". But, even after
removing the failed disk from the conf and a successful reconfigure, DataNode continues to
show the "NumFailedVolumes" as 1 and "FailedStorageLocations" as "D1" and it never gets reset.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message