Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 52A34200C76 for ; Fri, 28 Apr 2017 18:33:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 51707160BA3; Fri, 28 Apr 2017 16:33:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 70D19160BB9 for ; Fri, 28 Apr 2017 18:33:10 +0200 (CEST) Received: (qmail 75278 invoked by uid 500); 28 Apr 2017 16:33:08 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 75128 invoked by uid 99); 28 Apr 2017 16:33:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Apr 2017 16:33:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 1B1BFC1976 for ; Fri, 28 Apr 2017 16:33:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 50LMVcuk8oxk for ; Fri, 28 Apr 2017 16:33:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id F325D5FDD4 for ; Fri, 28 Apr 2017 16:33:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 144B6E0DD1 for ; Fri, 28 Apr 2017 16:33:06 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4FF8021DFC for ; Fri, 28 Apr 2017 16:33:05 +0000 (UTC) Date: Fri, 28 Apr 2017 16:33:05 +0000 (UTC) From: "Daryn Sharp (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 28 Apr 2017 16:33:11 -0000 [ https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989115#comment-15989115 ] Daryn Sharp commented on HDFS-11609: ------------------------------------ +1 Pending updating the comment "We do not use already decommissioned nodes as a source" to mention as a last resort. > Some blocks can be permanently lost if nodes are decommissioned while dead > -------------------------------------------------------------------------- > > Key: HDFS-11609 > URL: https://issues.apache.org/jira/browse/HDFS-11609 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.7.0 > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Blocker > Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch, HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch > > > When all the nodes containing a replica of a block are decommissioned while they are dead, they get decommissioned right away even if there are missing blocks. This behavior was introduced by HDFS-7374. > The problem starts when those decommissioned nodes are brought back online. The namenode no longer shows missing blocks, which creates a false sense of cluster health. When the decommissioned nodes are removed and reformatted, the block data is permanently lost. The namenode will report missing blocks after the heartbeat recheck interval (e.g. 10 minutes) from the moment the last node is taken down. > There are multiple issues in the code. As some cause different behaviors in testing vs. production, it took a while to reproduce it in a unit test. I will present analysis and proposal soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org