hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin P. McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
Date Tue, 19 Jul 2016 01:04:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383413#comment-15383413

Colin P. McCabe commented on HDFS-10301:

--- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
@@ -308,10 +308,10 @@ public synchronized boolean checkLease(DatanodeDescriptor dn,
       return false;
     if (node.leaseId == 0) {
-      LOG.warn("BR lease 0x{} is not valid for DN {}, because the DN " +
-               "is not in the pending set.",
-               Long.toHexString(id), dn.getDatanodeUuid());
-      return false;
+      LOG.debug("DN {} is not in the pending set because BR with "
+              + "lease 0x{} was processed out of order",
+          dn.getDatanodeUuid(), Long.toHexString(id));
+      return true;

There are other reasons why {{node.leaseId}} might be 0, besides block reports getting processed
out of order.  For example, an RPC could have gotten duplicated by something in the network.
 Let's not change the existing error message.

            StorageBlockReport[] lastSplitReport =
                new StorageBlockReport[perVolumeBlockLists.size()];
            // When block reports are split, the last RPC in the block report
            // has the information about all storages in the block report.
            // See HDFS-10301 for more details. To achieve this, the last RPC
            // has 'n' storage reports, where 'n' is the number of storages in
            // a DN. The actual block replicas are reported only for the
            // last/n-th storage.
Why do we have to use such a complex and confusing approach?  Like I commented earlier, a
report of the existing storages is not the same as a block report.  Why are we creating {{BlockListAsLongs}}
objects that aren't lists of blocks?

There is a much simpler approach, which is just adding a list of storage IDs to the block
report RPC by making a backwards-compatible protobuf change.  It's really easy:

+repeated String allStorageIds = 8;

> BlockReport retransmissions may lead to storages falsely being declared zombie if storage
report processing happens out of order
> --------------------------------------------------------------------------------------------------------------------------------
>                 Key: HDFS-10301
>                 URL: https://issues.apache.org/jira/browse/HDFS-10301
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.1
>            Reporter: Konstantin Shvachko
>            Assignee: Vinitha Reddy Gankidi
>            Priority: Critical
>         Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, HDFS-10301.004.patch,
HDFS-10301.005.patch, HDFS-10301.006.patch, HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch,
HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
> When NameNode is busy a DataNode can timeout sending a block report. Then it sends the
block report again. Then NameNode while process these two reports at the same time can interleave
processing storages from different reports. This screws up the blockReportId field, which
makes NameNode think that some storages are zombie. Replicas from zombie storages are immediately
removed, causing missing blocks.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message