Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C6AE2107DD for ; Thu, 19 Mar 2015 06:13:39 +0000 (UTC) Received: (qmail 57640 invoked by uid 500); 19 Mar 2015 06:13:38 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 57543 invoked by uid 500); 19 Mar 2015 06:13:38 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 57339 invoked by uid 99); 19 Mar 2015 06:13:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Mar 2015 06:13:38 +0000 Date: Thu, 19 Mar 2015 06:13:38 +0000 (UTC) From: "Yi Liu (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HDFS-7930) commitBlockSynchronization() does not remove locations MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368539#comment-14368539 ] Yi Liu edited comment on HDFS-7930 at 3/19/15 6:12 AM: ------------------------------------------------------- [~shv], thanks for your nice discussion. I took further look and found more: *1.* {quote} but I don't know what is the reason for postponing replica deletion. Postponing should probably be avoided in this case, since the commitBlockSync() is as good as block report for the particular block.{quote} The result is as following: * The postponing log in {{invalidateBlock}} called by {{markBlockAsCorrupt}} is not by result of block recovery, and it's by result of block report from {{dn0}} after it restarts. * Because as discussed, we will call {{setDataNodeDead}} after {{dn0}} shutdown, it results in NN checks heartbeat and finds {{dn0}} dead then remove {{dn0}}. Later we do truncate, block recovery happens, and genStamp of last block is increased. Then {{dn0}} restarts successfully and block report happens, but it's replica of last block has old genStamp, so it will be marked as corrupted, when NN invalidates it's block replica, {{nr.replicasOnStaleNodes() > 0}} is *true*, which is because {{DatanodeStorageInfo#blockContentsStale}} for {{dn0}} in NN is *true* at this time. Actually we should set {{blockContentsStale}} to *false* after heartbeat. But current logic is that {{blockContentsStale}} is set to *false* in {{DatanodeStorageInfo#receivedBlockReport}} which is called after we finish processing block report in {{BlockManager}}. *2.* {quote} otherwise NN can schedule recovery (with 1/3 probablity) on the dead DN as a primary. It will wait for the heartbeat from it, but will never receive it because it is down. {quote} Yes, the {{checkBlockRecovery()}} timeout is because of this. Currently when starting block recovery, NN chooses a DN (we say dn0) as primary, {{dn0}} can get the {{BlockRecoveryCommand}} through the response of it's heartbeat to NN. but if the {{dn0}} is dead, blockrecovery will not happen. Until {{expiredHardLimit{}}} is *true* in {{LeaseManager#checkLeases}}, then a new DN will be chosen as the primary and schedule a new block recovery for that last block. *3.* {quote} You need to call setDataNodeDead() after shutdown() {quote} Our modification to {{testTruncateWithDataNodesRestart}} is to assert replicas is decreased by 1 because replica of last block on {{dn0}} should be marked as corrupt in NN after block recovery. If we set {{setDataNodeDead}} after {{dn0}} shutdown and before {{truncate}}, the replicas is already decreased by 1, then new added assertion doesn't make sense. *4.* {quote} BTW, your change completely eliminates the failure of testTruncateWithDataNodesRestartImmediately() from HDFS-7886, which I ran without triggerBlockReports(). {quote} Right :) was (Author: hitliuyi): [~shv], thanks for your nice discussion. I took further look and found more: *1.* {quote} but I don't know what is the reason for postponing replica deletion. Postponing should probably be avoided in this case, since the commitBlockSync() is as good as block report for the particular block.{quote} This is another bug, I will file a new JIRA for it, I explain some as below: * The postponing log in {{invalidateBlock}} called by {{markBlockAsCorrupt}} is not by result of block recovery, and it's by result of block report from {{dn0}} after it restarts. * Because as discussed, we will call {{setDataNodeDead}} after {{dn0}} shutdown, it results in NN checks heartbeat and finds {{dn0}} dead then remove {{dn0}}. Later we do truncate, block recovery happens, and genStamp of last block is increased. Then {{dn0}} restarts successfully and block report happens, but it's replica of last block has old genStamp, so it will be marked as corrupted, when NN invalidates it's block replica, {{nr.replicasOnStaleNodes() > 0}} is *true*, which is because {{DatanodeStorageInfo#blockContentsStale}} for {{dn0}} in NN is *true* at this time. Actually we should set {{blockContentsStale}} to *false* after heartbeat. But current logic is that {{blockContentsStale}} is set to *false* in {{DatanodeStorageInfo#receivedBlockReport}} which is called after we finish processing block report in {{BlockManager}}. This is incorrect. *2.* {quote} otherwise NN can schedule recovery (with 1/3 probablity) on the dead DN as a primary. It will wait for the heartbeat from it, but will never receive it because it is down. {quote} Yes, the {{checkBlockRecovery()}} timeout is because of this. Currently when starting block recovery, NN chooses a DN (we say dn0) as primary, {{dn0}} can get the {{BlockRecoveryCommand}} through the response of it's heartbeat to NN. but if the {{dn0}} is dead, blockrecovery will not happen. Until {{expiredHardLimit{}}} is *true* in {{LeaseManager#checkLeases}}, then a new DN will be chosen as the primary and schedule a new block recovery for that last block. *3.* {quote} You need to call setDataNodeDead() after shutdown() {quote} Our modification to {{testTruncateWithDataNodesRestart}} is to assert replicas is decreased by 1 because replica of last block on {{dn0}} should be marked as corrupt in NN after block recovery. If we set {{setDataNodeDead}} after {{dn0}} shutdown and before {{truncate}}, the replicas is already decreased by 1, then new added assertion doesn't make sense. *4.* {quote} BTW, your change completely eliminates the failure of testTruncateWithDataNodesRestartImmediately() from HDFS-7886, which I ran without triggerBlockReports(). {quote} Right :) > commitBlockSynchronization() does not remove locations > ------------------------------------------------------ > > Key: HDFS-7930 > URL: https://issues.apache.org/jira/browse/HDFS-7930 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.7.0 > Reporter: Konstantin Shvachko > Assignee: Yi Liu > Priority: Blocker > Attachments: HDFS-7930.001.patch, HDFS-7930.002.patch, HDFS-7930.003.patch > > > When {{commitBlockSynchronization()}} has less {{newTargets}} than in the original block it does not remove unconfirmed locations. This results in that the the block stores locations of different lengths or genStamp (corrupt). -- This message was sent by Atlassian JIRA (v6.3.4#6332)