hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
Date Wed, 27 Apr 2016 07:18:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259688#comment-15259688

Konstantin Shvachko commented on HDFS-10301:

??Maybe I'm misunderstanding the proposal, but don't we already do all of this???

Yes you misunderstood. This part is not my proposal. This is what we already do, and therefore
I call them *Constraints*, because they complicate the *Problem*. The proposal is in the third
bullet point titled *Approach*.

??What does the NameNode do if the DataNode is restarted while sending these RPCs, so that
it never gets a chance to send all the storages that it claimed existed?  It seems like you
will get stuck??

No, I will not get stuck. All br-RCPs are completely independent of each other. It's just
that one of them has all storages, and indicates to the NameNode that it should update its
storage list for the DataNode. NN processes as many of such RPCs, as DN sends. If the DN dies
the NN will declare it dead in due time, or if DN restarts within 10 minutes it will send
new set of block reports from scratch. I do not see any inconsistencies.

You can think of it as a new operation SyncStorages, which does just that - updates NameNode's
knowledge of DN's storages. I combined this operation with the first br-RPC. One can combine
it with any other call, same as you propose to combine it with the heartbeat. Except it seems
a poor idea, since we don't want to wait for removal of thousands of replicas on a heartbeat.

??interleaved block reports are extremely rare??

You keep saying this. But it is not rare for me. Are you convincing me not to believe my eyes
or that you checked the logs on your thousands of clusters? I did check mine.

> BlockReport retransmissions may lead to storages falsely being declared zombie if storage
report processing happens out of order
> --------------------------------------------------------------------------------------------------------------------------------
>                 Key: HDFS-10301
>                 URL: https://issues.apache.org/jira/browse/HDFS-10301
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.1
>            Reporter: Konstantin Shvachko
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, HDFS-10301.01.patch,
> When NameNode is busy a DataNode can timeout sending a block report. Then it sends the
block report again. Then NameNode while process these two reports at the same time can interleave
processing storages from different reports. This screws up the blockReportId field, which
makes NameNode think that some storages are zombie. Replicas from zombie storages are immediately
removed, causing missing blocks.

This message was sent by Atlassian JIRA

View raw message