hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages
Date Thu, 21 Apr 2016 18:08:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252377#comment-15252377
] 

Konstantin Shvachko commented on HDFS-10301:
--------------------------------------------

Hey Walter, your patch looks good by itself, but it does not address the bug in the zombie
storage recognition.
Took me some time to review your patch, would have been easier if you explained your approach.
So your patch is reordering block reports for different storages in such a way that storages
from the same report are placed as a contiguous segment in the block report queue, so that
processing of different BRs is not interleaved. This addresses Daryn's comment rather than
solving the reported bug, as BTW Daryn correctly stated.
If you want to go forward with reordering of BRs you should probably do it in another issue.
I personally am not a supporter because
# It introduces an unnecessary restriction on the order of execution of block reports, and
# adds even more complexity to BR processing logic.

I see the main problem here that block reports used to be idempotent per storage, but HDFS-7960
made execution of a subsequent storage dependent on the state produced during execution of
the previous ones. I think idempotent is good, and we should keep it. I think we can mitigate
the problem by one of the following
# Changing the criteria of zombie storage recognition. Why should it depend on block report
IDs?
# Eliminating the notion of zombie storage altogether. E.g., NN can DN to run {{DirectoryScanner}}
if NN thinks DN's state is outdated.
# Try to move {{curBlockReportId}} from {{DatanodeDescriptor}} to {{StorageInfo}}, which will
eliminate global state between storages.

Also if we cannot come up with a quick solution, then we should probably roll back HDFS-7960
for now and revisit it later, because this is a critical bug effecting all of our latest releases.
And that is a lot of clusters and PBs out there.

> Blocks removed by thousands due to falsely detected zombie storages
> -------------------------------------------------------------------
>
>                 Key: HDFS-10301
>                 URL: https://issues.apache.org/jira/browse/HDFS-10301
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.1
>            Reporter: Konstantin Shvachko
>            Priority: Critical
>         Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it sends the
block report again. Then NameNode while process these two reports at the same time can interleave
processing storages from different reports. This screws up the blockReportId field, which
makes NameNode think that some storages are zombie. Replicas from zombie storages are immediately
removed, causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message