Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6DD2B930C for ; Mon, 13 Feb 2012 20:23:24 +0000 (UTC) Received: (qmail 53363 invoked by uid 500); 13 Feb 2012 20:23:24 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 53088 invoked by uid 500); 13 Feb 2012 20:23:23 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 53080 invoked by uid 99); 13 Feb 2012 20:23:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Feb 2012 20:23:23 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Feb 2012 20:23:20 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 073741B4657 for ; Mon, 13 Feb 2012 20:23:00 +0000 (UTC) Date: Mon, 13 Feb 2012 20:23:00 +0000 (UTC) From: "Suresh Srinivas (Updated) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <363815628.33247.1329164580031.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1061168145.58493.1327018600244.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2815: ---------------------------------- Target Version/s: 0.24.0, 1.1.0, 0.23.2 (was: 0.23.2, 0.24.0) Fix Version/s: 0.23.2 0.24.0 I committed the patch to 0.24 and 0.23. Thank you Uma. We should fix this for 1.1.0 release. However that is non-trivial since it requires parts of the functionality from HDFS-173. @Uma do you want to take a stab at it? > Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed. > ---------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-2815 > URL: https://issues.apache.org/jira/browse/HDFS-2815 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0 > Reporter: Uma Maheswara Rao G > Assignee: Uma Maheswara Rao G > Priority: Critical > Fix For: 0.24.0, 0.23.2 > > Attachments: HDFS-2815.patch, HDFS-2815.patch > > > When tested the HA(internal) with continuous switch with some 5mins gap, found some *blocks missed* and namenode went into safemode after next switch. > > After the analysis, i found that this files already deleted by clients. But i don't see any delete commands logs namenode log files. But namenode added that blocks to invalidateSets and DNs deleted the blocks. > When restart of the namenode, it went into safemode and expecting some more blocks to come out of safemode. > Here the reason could be that, file has been deleted in memory and added into invalidates after this it is trying to sync the edits into editlog file. By that time NN asked DNs to delete that blocks. Now namenode shuts down before persisting to editlogs.( log behind) > Due to this reason, we may not get the INFO logs about delete, and when we restart the Namenode (in my scenario it is again switch), Namenode expects this deleted blocks also, as delete request is not persisted into editlog before. > I reproduced this scenario with bedug points. *I feel, We should not add the blocks to invalidates before persisting into Editlog*. > Note: for switch, we used kill -9 (force kill) > I am currently in 0.20.2 version. Same verified in 0.23 as well in normal crash + restart scenario. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira