Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A0ABA200BB4 for ; Tue, 18 Oct 2016 02:08:00 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9E06F160AEC; Tue, 18 Oct 2016 00:08:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EAB42160AFC for ; Tue, 18 Oct 2016 02:07:59 +0200 (CEST) Received: (qmail 53702 invoked by uid 500); 18 Oct 2016 00:07:59 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 53690 invoked by uid 99); 18 Oct 2016 00:07:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Oct 2016 00:07:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BD8902C0059 for ; Tue, 18 Oct 2016 00:07:58 +0000 (UTC) Date: Tue, 18 Oct 2016 00:07:58 +0000 (UTC) From: "Wei-Chiu Chuang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-11022) DataNode unable to remove corrupt block replica due to race condition MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 18 Oct 2016 00:08:00 -0000 [ https://issues.apache.org/jira/browse/HDFS-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-11022: ----------------------------------- Description: Scenario: # A client reads a replica blk_A_x from a data node and detected corruption. # In the meantime, the replica is appended, updating its generation stamp from x to y. # The client tells NN to mark the replica blk_A_x corrupt. # NN tells the data node to (1) delete replica blk_A_x and (2) replicate the newer replica blk_A_y from another datanode. Due to block placement policy, blk_A_y is replicated to the same node. (It's a small cluster) # DN is unable to receive the newer replica blk_A_y, because the replica already exists. # DN is also unable to delete replica blk_A_y because blk_A_y does not exist. # The replica on the DN is not part of data pipeline, so it becomes stale. If another replica becomes corrupt and NameNode wants to replicate a healthy replica to this DataNode, it can't, because a stale replica exists. Because this is a small cluster, soon enough (in a matter of a hour) no DataNode is able to receive a healthy replica. This cluster also suffers from HDFS-11019, so even though DataNode later detected data corruption, it was unable to report to NameNode. Note that we are still investigating the root cause of the corruption. The access pattern of client is through Httpfs, and it appended to finalized blocks and then finalize the block quickly. It's not long running pipeline. was: Scenario: # A client reads a replica blk_A_x from a data node and detected corruption. # In the meantime, the replica is appended, updating its generation stamp from x to y. # The client tells NN to mark the replica blk_A_x corrupt. # NN tells the data node to (1) delete replica blk_A_x and (2) replicate the newer replica blk_A_y from another datanode. Due to block placement policy, blk_A_y is replicated to the same node. (It's a small cluster) # DN is unable to receive the newer replica blk_A_y, because the replica already exists. # DN is also unable to delete replica blk_A_y because blk_A_y does not exist. # The replica on the DN is not part of data pipeline, so it becomes stale. If another replica becomes corrupt and NameNode wants to replicate a healthy replica to this DataNode, it can't, because a stale replica exists. Because this is a small cluster, soon enough (in a matter of a hour) no DataNode is able to receive a healthy replica. This cluster also suffers from HDFS-11019, so even though DataNode later detected data corruption, it was unable to report to NameNode. Note that we are still investigating the root cause of the corruption. > DataNode unable to remove corrupt block replica due to race condition > --------------------------------------------------------------------- > > Key: HDFS-11022 > URL: https://issues.apache.org/jira/browse/HDFS-11022 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode > Affects Versions: 2.6.0 > Environment: CDH5.7.0 > Reporter: Wei-Chiu Chuang > Priority: Critical > Attachments: HDFS-11022.png > > > Scenario: > # A client reads a replica blk_A_x from a data node and detected corruption. > # In the meantime, the replica is appended, updating its generation stamp from x to y. > # The client tells NN to mark the replica blk_A_x corrupt. > # NN tells the data node to (1) delete replica blk_A_x and (2) replicate the newer replica blk_A_y from another datanode. Due to block placement policy, blk_A_y is replicated to the same node. (It's a small cluster) > # DN is unable to receive the newer replica blk_A_y, because the replica already exists. > # DN is also unable to delete replica blk_A_y because blk_A_y does not exist. > # The replica on the DN is not part of data pipeline, so it becomes stale. > If another replica becomes corrupt and NameNode wants to replicate a healthy replica to this DataNode, it can't, because a stale replica exists. Because this is a small cluster, soon enough (in a matter of a hour) no DataNode is able to receive a healthy replica. > This cluster also suffers from HDFS-11019, so even though DataNode later detected data corruption, it was unable to report to NameNode. > Note that we are still investigating the root cause of the corruption. > The access pattern of client is through Httpfs, and it appended to finalized blocks and then finalize the block quickly. It's not long running pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org