Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1481E200D0F for ; Fri, 29 Sep 2017 23:43:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 130DE1609ED; Fri, 29 Sep 2017 21:43:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 58F311609BC for ; Fri, 29 Sep 2017 23:43:22 +0200 (CEST) Received: (qmail 74191 invoked by uid 500); 29 Sep 2017 21:43:15 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 74101 invoked by uid 99); 29 Sep 2017 21:43:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Sep 2017 21:43:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 86387CA96A for ; Fri, 29 Sep 2017 21:43:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 9zShI3Hq8T1I for ; Fri, 29 Sep 2017 21:43:12 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 677FD60D18 for ; Fri, 29 Sep 2017 21:43:12 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id BE519E258D for ; Fri, 29 Sep 2017 21:43:11 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5D428243A5 for ; Fri, 29 Sep 2017 21:43:10 +0000 (UTC) Date: Fri, 29 Sep 2017 21:43:10 +0000 (UTC) From: "Arun Suresh (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 29 Sep 2017 21:43:23 -0000 [ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated HDFS-8344: ------------------------------ Is this still on target for 2.9.0 ? If not, can we we push this out to the next major release ? > NameNode doesn't recover lease for files with missing blocks > ------------------------------------------------------------ > > Key: HDFS-8344 > URL: https://issues.apache.org/jira/browse/HDFS-8344 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.7.0 > Reporter: Ravi Prakash > Assignee: Ravi Prakash > Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, HDFS-8344.09.patch, HDFS-8344.10.patch, TestHadoop.java > > > I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster > # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait > {code} > public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; > public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; > {code} > # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) > # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter" > # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) > I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. > The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org