Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B6DC4200C8F for ; Fri, 9 Jun 2017 20:01:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B5B81160BEB; Fri, 9 Jun 2017 18:01:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 12FD0160BB6 for ; Fri, 9 Jun 2017 20:01:22 +0200 (CEST) Received: (qmail 36718 invoked by uid 500); 9 Jun 2017 18:01:21 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 36706 invoked by uid 99); 9 Jun 2017 18:01:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jun 2017 18:01:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 50289C67D2 for ; Fri, 9 Jun 2017 18:01:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.011 X-Spam-Level: X-Spam-Status: No, score=-100.011 tagged_above=-999 required=6.31 tests=[SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id m_7mH-DLTum2 for ; Fri, 9 Jun 2017 18:01:20 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 160D85FC3D for ; Fri, 9 Jun 2017 18:01:20 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0D978E0D5C for ; Fri, 9 Jun 2017 18:01:19 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5B7C321E0F for ; Fri, 9 Jun 2017 18:01:18 +0000 (UTC) Date: Fri, 9 Jun 2017 18:01:18 +0000 (UTC) From: "Kihwal Lee (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11960) Successfully closed files can stay under-replicated. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 09 Jun 2017 18:01:23 -0000 [ https://issues.apache.org/jira/browse/HDFS-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044775#comment-16044775 ] Kihwal Lee commented on HDFS-11960: ----------------------------------- The simplest fix will be not letting {{addBlock()}} remove a pending replication, if the reported genstamp is not current. {code} - if (storedBlock != null) { + if (storedBlock != null && + block.getGenerationStamp() == storedBlock.getGenerationStamp()) { pendingReconstruction.decrement(storedBlock, node); } {code} This way, the corrupt replica will still be deleted and if the replication is tried and fails before the deletion, the pending replication will expire and rescheduled. Even if it is scheduled to the same target again, it will work. > Successfully closed files can stay under-replicated. > ---------------------------------------------------- > > Key: HDFS-11960 > URL: https://issues.apache.org/jira/browse/HDFS-11960 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Critical > > If a certain set of conditions hold at the time of a file creation, a block of the file can stay under-replicated. This is because the block is mistakenly taken out of the under-replicated block queue and never gets reevaluated. > Re-evaluation can be triggered if > - a replica containing node dies. > - setrep is called > - NN repl queues are reinitialized (NN failover or restart) > If none of these happens, the block stays under-replicated. > Here is how it happens. > 1) A replica is finalized, but the ACK does not reach the upstream in time. IBR is also delayed. > 2) A close recovery happens, which updates the gen stamp of "healthy" replicas. > 3) The file is closed with the healthy replicas. It is added to the replication queue. > 4) A replication is scheduled, so it is added to the pending replication list. The replication target is picked as the failed node in 1). > 5) The old IBR is finally received for the failed/excluded node. In the meantime, the replication fails, because there is already a finalized replica (with older gen stamp) on the node. > 6) The IBR processing removes the block from the pending list, adds it to corrupt replicas list, and then issues invalidation. Since the block is in neither replication queue nor pending list, it stays under-replicated. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org