Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Thu, 19 Mar 2015 06:13:38 +0000 (UTC)
From: "Yi Liu (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12781904.1426274075000.140324.1426745618652@Atlassian.JIRA>
In-Reply-To: <JIRA.12781904.1426274075000@Atlassian.JIRA>
References: <JIRA.12781904.1426274075000@Atlassian.JIRA>
 <JIRA.12781904.1426274075947@arcas>
Subject: [jira] [Comment Edited] (HDFS-7930) commitBlockSynchronization()
 does not remove locations
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368539#comment-14368539 ] 

Yi Liu edited comment on HDFS-7930 at 3/19/15 6:12 AM:
-------------------------------------------------------

[~shv], thanks for your nice discussion. I took further look and found more:
*1.* {quote}  but I don't know what is the reason for postponing replica deletion. Postponing should probably be avoided in this case, since the commitBlockSync() is as good as block report for the particular block.{quote}

The result is as following:
  * The postponing log in {{invalidateBlock}} called by {{markBlockAsCorrupt}} is not by result of block recovery, and it's by result of block report from {{dn0}} after it restarts. 
  * Because as discussed, we will call {{setDataNodeDead}} after {{dn0}} shutdown, it results in NN checks heartbeat and finds {{dn0}} dead then remove {{dn0}}. Later we do truncate, block recovery happens, and genStamp of last block is increased.  Then {{dn0}} restarts successfully and block report happens, but it's replica of last block has old genStamp, so it will be marked as corrupted, when NN invalidates it's block replica, {{nr.replicasOnStaleNodes() > 0}} is *true*, which is because {{DatanodeStorageInfo#blockContentsStale}} for {{dn0}} in NN is *true* at this time.  Actually we should set {{blockContentsStale}} to *false* after heartbeat. But current logic is that {{blockContentsStale}} is set to *false* in {{DatanodeStorageInfo#receivedBlockReport}} which is called after we finish processing block report in {{BlockManager}}.

*2.*
{quote}
otherwise NN can schedule recovery (with 1/3 probablity) on the dead DN as a primary. It will wait for the heartbeat from it, but will never receive it because it is down.
{quote}
Yes, the {{checkBlockRecovery()}} timeout is because of this.  Currently when starting block recovery, NN chooses a DN (we say dn0) as primary, {{dn0}} can get the {{BlockRecoveryCommand}} through the response of it's heartbeat to NN. but if the {{dn0}} is dead, blockrecovery will not happen. Until {{expiredHardLimit{}}} is *true* in {{LeaseManager#checkLeases}}, then a new DN will be chosen as the primary and schedule a new block recovery for that last block.

*3.*
{quote}
You need to call setDataNodeDead() after shutdown()
{quote}
Our modification to {{testTruncateWithDataNodesRestart}} is to assert replicas is decreased by 1 because replica of last block on {{dn0}} should be marked as corrupt in NN after block recovery.
If we set {{setDataNodeDead}} after {{dn0}} shutdown and before {{truncate}}, the replicas is already decreased by 1, then new added assertion doesn't make sense.

*4.*
{quote}
BTW, your change completely eliminates the failure of testTruncateWithDataNodesRestartImmediately() from HDFS-7886, which I ran without triggerBlockReports().
{quote}
Right :)


was (Author: hitliuyi):
[~shv], thanks for your nice discussion. I took further look and found more:
*1.* {quote}  but I don't know what is the reason for postponing replica deletion. Postponing should probably be avoided in this case, since the commitBlockSync() is as good as block report for the particular block.{quote}

This is another bug, I will file a new JIRA for it, I explain some as below:
  * The postponing log in {{invalidateBlock}} called by {{markBlockAsCorrupt}} is not by result of block recovery, and it's by result of block report from {{dn0}} after it restarts. 
  * Because as discussed, we will call {{setDataNodeDead}} after {{dn0}} shutdown, it results in NN checks heartbeat and finds {{dn0}} dead then remove {{dn0}}. Later we do truncate, block recovery happens, and genStamp of last block is increased.  Then {{dn0}} restarts successfully and block report happens, but it's replica of last block has old genStamp, so it will be marked as corrupted, when NN invalidates it's block replica, {{nr.replicasOnStaleNodes() > 0}} is *true*, which is because {{DatanodeStorageInfo#blockContentsStale}} for {{dn0}} in NN is *true* at this time.  Actually we should set {{blockContentsStale}} to *false* after heartbeat. But current logic is that {{blockContentsStale}} is set to *false* in {{DatanodeStorageInfo#receivedBlockReport}} which is called after we finish processing block report in {{BlockManager}}. This is incorrect.

*2.*
{quote}
otherwise NN can schedule recovery (with 1/3 probablity) on the dead DN as a primary. It will wait for the heartbeat from it, but will never receive it because it is down.
{quote}
Yes, the {{checkBlockRecovery()}} timeout is because of this.  Currently when starting block recovery, NN chooses a DN (we say dn0) as primary, {{dn0}} can get the {{BlockRecoveryCommand}} through the response of it's heartbeat to NN. but if the {{dn0}} is dead, blockrecovery will not happen. Until {{expiredHardLimit{}}} is *true* in {{LeaseManager#checkLeases}}, then a new DN will be chosen as the primary and schedule a new block recovery for that last block.

*3.*
{quote}
You need to call setDataNodeDead() after shutdown()
{quote}
Our modification to {{testTruncateWithDataNodesRestart}} is to assert replicas is decreased by 1 because replica of last block on {{dn0}} should be marked as corrupt in NN after block recovery.
If we set {{setDataNodeDead}} after {{dn0}} shutdown and before {{truncate}}, the replicas is already decreased by 1, then new added assertion doesn't make sense.

*4.*
{quote}
BTW, your change completely eliminates the failure of testTruncateWithDataNodesRestartImmediately() from HDFS-7886, which I ran without triggerBlockReports().
{quote}
Right :)

> commitBlockSynchronization() does not remove locations
> ------------------------------------------------------
>
>                 Key: HDFS-7930
>                 URL: https://issues.apache.org/jira/browse/HDFS-7930
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.7.0
>            Reporter: Konstantin Shvachko
>            Assignee: Yi Liu
>            Priority: Blocker
>         Attachments: HDFS-7930.001.patch, HDFS-7930.002.patch, HDFS-7930.003.patch
>
>
> When {{commitBlockSynchronization()}} has less {{newTargets}} than in the original block it does not remove unconfirmed locations. This results in that the the block stores locations of different lengths or genStamp (corrupt).


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)