Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6597211C43 for ; Wed, 30 Jul 2014 20:50:40 +0000 (UTC) Received: (qmail 70117 invoked by uid 500); 30 Jul 2014 20:50:39 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 70037 invoked by uid 500); 30 Jul 2014 20:50:39 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 69843 invoked by uid 99); 30 Jul 2014 20:50:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jul 2014 20:50:39 +0000 Date: Wed, 30 Jul 2014 20:50:39 +0000 (UTC) From: "Hadoop QA (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079950#comment-14079950 ] Hadoop QA commented on HDFS-6772: --------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658667/HDFS-6772.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7501//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7501//console This message is automatically generated. > Get DNs out of blockContentsStale==true state faster when NN restarts > --------------------------------------------------------------------- > > Key: HDFS-6772 > URL: https://issues.apache.org/jira/browse/HDFS-6772 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Ming Ma > Assignee: Ming Ma > Attachments: HDFS-6772.patch > > > Here is the non-HA scenario. > 1. Get HDFS into block-over-replicated situation. > 2. Restart the NN. > 3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks size will impact blockreport latency. Given blockreport takes NN global lock, it has severe impact on NN performance and make the cluster unstable. > Why will DNs remain in blockContentsStale==true state for a long time? > 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat. In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request; DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after that. > 2. So right after the first blockreport, given heartbeatedSinceFailover remains false, blockContentsStale will stay true. > {noformat} > DatanodeStorageInfo.java > void receivedBlockReport() { > if (heartbeatedSinceFailover) { > blockContentsStale = false; > } > blockReportCount++; > } > {noformat} > 3. So the DN will remain in blockContentsStale==true until the next blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to some large value. > -- This message was sent by Atlassian JIRA (v6.2#6252)