hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Liu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-7930) commitBlockSynchronization() does not remove locations
Date Thu, 19 Mar 2015 06:13:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368539#comment-14368539
] 

Yi Liu edited comment on HDFS-7930 at 3/19/15 6:12 AM:
-------------------------------------------------------

[~shv], thanks for your nice discussion. I took further look and found more:
*1.* {quote}  but I don't know what is the reason for postponing replica deletion. Postponing
should probably be avoided in this case, since the commitBlockSync() is as good as block report
for the particular block.{quote}

The result is as following:
  * The postponing log in {{invalidateBlock}} called by {{markBlockAsCorrupt}} is not by result
of block recovery, and it's by result of block report from {{dn0}} after it restarts. 
  * Because as discussed, we will call {{setDataNodeDead}} after {{dn0}} shutdown, it results
in NN checks heartbeat and finds {{dn0}} dead then remove {{dn0}}. Later we do truncate, block
recovery happens, and genStamp of last block is increased.  Then {{dn0}} restarts successfully
and block report happens, but it's replica of last block has old genStamp, so it will be marked
as corrupted, when NN invalidates it's block replica, {{nr.replicasOnStaleNodes() > 0}}
is *true*, which is because {{DatanodeStorageInfo#blockContentsStale}} for {{dn0}} in NN is
*true* at this time.  Actually we should set {{blockContentsStale}} to *false* after heartbeat.
But current logic is that {{blockContentsStale}} is set to *false* in {{DatanodeStorageInfo#receivedBlockReport}}
which is called after we finish processing block report in {{BlockManager}}.

*2.*
{quote}
otherwise NN can schedule recovery (with 1/3 probablity) on the dead DN as a primary. It will
wait for the heartbeat from it, but will never receive it because it is down.
{quote}
Yes, the {{checkBlockRecovery()}} timeout is because of this.  Currently when starting block
recovery, NN chooses a DN (we say dn0) as primary, {{dn0}} can get the {{BlockRecoveryCommand}}
through the response of it's heartbeat to NN. but if the {{dn0}} is dead, blockrecovery will
not happen. Until {{expiredHardLimit{}}} is *true* in {{LeaseManager#checkLeases}}, then a
new DN will be chosen as the primary and schedule a new block recovery for that last block.

*3.*
{quote}
You need to call setDataNodeDead() after shutdown()
{quote}
Our modification to {{testTruncateWithDataNodesRestart}} is to assert replicas is decreased
by 1 because replica of last block on {{dn0}} should be marked as corrupt in NN after block
recovery.
If we set {{setDataNodeDead}} after {{dn0}} shutdown and before {{truncate}}, the replicas
is already decreased by 1, then new added assertion doesn't make sense.

*4.*
{quote}
BTW, your change completely eliminates the failure of testTruncateWithDataNodesRestartImmediately()
from HDFS-7886, which I ran without triggerBlockReports().
{quote}
Right :)


was (Author: hitliuyi):
[~shv], thanks for your nice discussion. I took further look and found more:
*1.* {quote}  but I don't know what is the reason for postponing replica deletion. Postponing
should probably be avoided in this case, since the commitBlockSync() is as good as block report
for the particular block.{quote}

This is another bug, I will file a new JIRA for it, I explain some as below:
  * The postponing log in {{invalidateBlock}} called by {{markBlockAsCorrupt}} is not by result
of block recovery, and it's by result of block report from {{dn0}} after it restarts. 
  * Because as discussed, we will call {{setDataNodeDead}} after {{dn0}} shutdown, it results
in NN checks heartbeat and finds {{dn0}} dead then remove {{dn0}}. Later we do truncate, block
recovery happens, and genStamp of last block is increased.  Then {{dn0}} restarts successfully
and block report happens, but it's replica of last block has old genStamp, so it will be marked
as corrupted, when NN invalidates it's block replica, {{nr.replicasOnStaleNodes() > 0}}
is *true*, which is because {{DatanodeStorageInfo#blockContentsStale}} for {{dn0}} in NN is
*true* at this time.  Actually we should set {{blockContentsStale}} to *false* after heartbeat.
But current logic is that {{blockContentsStale}} is set to *false* in {{DatanodeStorageInfo#receivedBlockReport}}
which is called after we finish processing block report in {{BlockManager}}. This is incorrect.

*2.*
{quote}
otherwise NN can schedule recovery (with 1/3 probablity) on the dead DN as a primary. It will
wait for the heartbeat from it, but will never receive it because it is down.
{quote}
Yes, the {{checkBlockRecovery()}} timeout is because of this.  Currently when starting block
recovery, NN chooses a DN (we say dn0) as primary, {{dn0}} can get the {{BlockRecoveryCommand}}
through the response of it's heartbeat to NN. but if the {{dn0}} is dead, blockrecovery will
not happen. Until {{expiredHardLimit{}}} is *true* in {{LeaseManager#checkLeases}}, then a
new DN will be chosen as the primary and schedule a new block recovery for that last block.

*3.*
{quote}
You need to call setDataNodeDead() after shutdown()
{quote}
Our modification to {{testTruncateWithDataNodesRestart}} is to assert replicas is decreased
by 1 because replica of last block on {{dn0}} should be marked as corrupt in NN after block
recovery.
If we set {{setDataNodeDead}} after {{dn0}} shutdown and before {{truncate}}, the replicas
is already decreased by 1, then new added assertion doesn't make sense.

*4.*
{quote}
BTW, your change completely eliminates the failure of testTruncateWithDataNodesRestartImmediately()
from HDFS-7886, which I ran without triggerBlockReports().
{quote}
Right :)

> commitBlockSynchronization() does not remove locations
> ------------------------------------------------------
>
>                 Key: HDFS-7930
>                 URL: https://issues.apache.org/jira/browse/HDFS-7930
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.7.0
>            Reporter: Konstantin Shvachko
>            Assignee: Yi Liu
>            Priority: Blocker
>         Attachments: HDFS-7930.001.patch, HDFS-7930.002.patch, HDFS-7930.003.patch
>
>
> When {{commitBlockSynchronization()}} has less {{newTargets}} than in the original block
it does not remove unconfirmed locations. This results in that the the block stores locations
of different lengths or genStamp (corrupt).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message