Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1D42010A42 for ; Tue, 1 Oct 2013 23:14:25 +0000 (UTC) Received: (qmail 50321 invoked by uid 500); 1 Oct 2013 23:14:24 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 50274 invoked by uid 500); 1 Oct 2013 23:14:24 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 50231 invoked by uid 99); 1 Oct 2013 23:14:24 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Oct 2013 23:14:24 +0000 Date: Tue, 1 Oct 2013 23:14:24 +0000 (UTC) From: "Jing Zhao (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5283: ---------------------------- Attachment: HDFS-5283.000.patch Thanks for the fix Vinay! Your analysis makes sense to me, and I think your patch can fix the file-deletion scenario. For dir-deletion scenario, instead of changing the current snapshot code (i.e., to convert all the INodeFileUC under the deleted dir to INodeFIleUCWithSnapshot), I think maybe we can just check if the full name of the INodeFile retrieved from the blocksMap can still represent an INode in the current fsdir tree, and if yes, whether the corresponding inode is the same with the one in blocksMap. So I tried to provide a patch based on your existing patch, with the extra check mentioned above and some other small fixes. We can continue working on this patch if you think this is the correct path. > NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold > -------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-5283 > URL: https://issues.apache.org/jira/browse/HDFS-5283 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots > Affects Versions: 3.0.0, 2.1.1-beta > Reporter: Vinay > Assignee: Vinay > Priority: Blocker > Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch > > > This is observed in one of our env: > 1. A MR Job was running which has created some temporary files and was writing to them. > 2. Snapshot was taken > 3. And Job was killed and temporary files were deleted. > 4. Namenode restarted. > 5. After restart Namenode was in safemode waiting for blocks > Analysis > --------- > 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. > 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots > 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)