Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DAD5F181DC for ; Wed, 24 Feb 2016 05:14:23 +0000 (UTC) Received: (qmail 18167 invoked by uid 500); 24 Feb 2016 05:14:18 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 17886 invoked by uid 500); 24 Feb 2016 05:14:18 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 17861 invoked by uid 99); 24 Feb 2016 05:14:18 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2016 05:14:18 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 4962D2C1F71 for ; Wed, 24 Feb 2016 05:14:18 +0000 (UTC) Date: Wed, 24 Feb 2016 05:14:18 +0000 (UTC) From: "Rakesh R (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-9837) BlockManager#countNodes should be able to detect duplicated internal blocks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-9837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15160190#comment-15160190 ] Rakesh R commented on HDFS-9837: -------------------------------- bq. We're using BitSet here thus we're not tracking all the storages (whose total number can exceeds 9) but all the possible internal blocks. Only need to make sure the bitset covers the block ID range Yeah, I got it. Agreed. > BlockManager#countNodes should be able to detect duplicated internal blocks > --------------------------------------------------------------------------- > > Key: HDFS-9837 > URL: https://issues.apache.org/jira/browse/HDFS-9837 > Project: Hadoop HDFS > Issue Type: Sub-task > Affects Versions: 3.0.0 > Reporter: Jing Zhao > Assignee: Jing Zhao > Attachments: HDFS-9837.000.patch, HDFS-9837.001.patch, HDFS-9837.002.patch, HDFS-9837.003.patch, HDFS-9837.004.patch > > > Currently {{BlockManager#countNodes}} only counts the number of replicas/internal blocks thus it cannot detect the under-replicated scenario where a striped EC block has 9 internal blocks but contains duplicated data/parity blocks. E.g., b8 is missing while 2 b0 exist: > b0, b1, b2, b3, b4, b5, b6, b7, b0 > If the NameNode keeps running, NN is able to detect the duplication of b0 and will put the block into the excess map. {{countNodes}} excludes internal blocks captured in the excess map thus can return the correct number of live replicas. However, if NN restarts before sending out the reconstruction command, the missing internal block cannot be detected anymore. The following steps can reproduce the issue: > # create an EC file > # kill DN1 and wait for the reconstruction to happen > # start DN1 again > # kill DN2 and restart NN immediately -- This message was sent by Atlassian JIRA (v6.3.4#6332)