Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1806718F28 for ; Mon, 18 Apr 2016 15:41:26 +0000 (UTC) Received: (qmail 80073 invoked by uid 500); 18 Apr 2016 15:41:25 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 79930 invoked by uid 500); 18 Apr 2016 15:41:25 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 79844 invoked by uid 99); 18 Apr 2016 15:41:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Apr 2016 15:41:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 85E8E2C1F5A for ; Mon, 18 Apr 2016 15:41:25 +0000 (UTC) Date: Mon, 18 Apr 2016 15:41:25 +0000 (UTC) From: "Daryn Sharp (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245867#comment-15245867 ] Daryn Sharp commented on HDFS-10301: ------------------------------------ Enabling HDFS-9198 will fifo process BRs. It doesn't solve this implementation bug but virtually eliminates it from occurring. > Blocks removed by thousands due to falsely detected zombie storages > ------------------------------------------------------------------- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.1 > Reporter: Konstantin Shvachko > Priority: Critical > Attachments: zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it sends the block report again. Then NameNode while process these two reports at the same time can interleave processing storages from different reports. This screws up the blockReportId field, which makes NameNode think that some storages are zombie. Replicas from zombie storages are immediately removed, causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)