hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another
Date Wed, 13 May 2015 00:00:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541061#comment-14541061

Colin Patrick McCabe commented on HDFS-8380:

Background: HDFS-6830 attempted to implement "block shifting logic," whereby when the NameNode
received a report about some replica saying it was in some DataNode storage, it would update
the NN's internal data structures to reflect the fact that this replica was not in any other
storages on that DataNode.  The assumption was (and still is) that each replica is present
in at most one storage on each DN (an assumption we might want to revisit at some point, but
that's outside the scope of this JIRA...).

HDFS-6830 was flawed, however.  Although it changed {{BlockManager#addBlock}} to update the
storage which a particular block was in, it would not actually call {{BlockManager#addBlock}}
on blocks it received in the full block report, if it had already seen their IDs.  So in the
case where blocks were moved between storages, HDFS-6830 would not actually update the internal
data structures on the NameNode... they would remain in the old storages.

HDFS-6991, although it would appear to be unrelated based on the title, actually has a partial
fix for the bug in HDFS-6830, in the form of this code:

-        && (!storedBlock.findDatanode(dn)
-        || corruptReplicas.isReplicaCorrupt(storedBlock, dn))) {
+        && (storedBlock.findStorageInfo(storageInfo) == -1 ||
+            corruptReplicas.isReplicaCorrupt(storedBlock, dn))) {

However, HDFS-6991 doesn't fix the issue for RBW blocks.  Admittedly, it is much less likely
for RBW blocks to be shifted between storages, because when restarting a datanode, the RBW
replicas become RWR.  However, for the sake of robustness, we should implement the shifting
behavior there too.

This patch does that.  It also adds logging for the first time we receive a storage report
for a given storage.  This should happen only once per storage, so it won't generate too many
logs.  It will be useful for tracing what is going on.  It also adds debug logs to the initial
storage report, similar to the debug logs available for the non-initial storage report.  Finally,
it adds a unit test for the shifting behavior.  The unit test tests shifting of finalized
blocks rather than RBW ones, so it doesn't require the rest of the patch to pass, but it's
still very useful for preventing regressions.

> Always call addStoredBlock on blocks which have been shifted from one storage to another
> ----------------------------------------------------------------------------------------
>                 Key: HDFS-8380
>                 URL: https://issues.apache.org/jira/browse/HDFS-8380
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-8380.001.patch
> We should always call addStoredBlock on blocks which have been shifted from one storage
to another.

This message was sent by Atlassian JIRA

View raw message