Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 39FCC11259 for ; Fri, 4 Jul 2014 07:30:03 +0000 (UTC) Received: (qmail 49181 invoked by uid 500); 4 Jul 2014 07:29:58 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 49073 invoked by uid 500); 4 Jul 2014 07:29:58 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 49062 invoked by uid 99); 4 Jul 2014 07:29:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jul 2014 07:29:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nitin2goyal@gmail.com designates 209.85.215.41 as permitted sender) Received: from [209.85.215.41] (HELO mail-la0-f41.google.com) (209.85.215.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jul 2014 07:29:52 +0000 Received: by mail-la0-f41.google.com with SMTP id hz20so912973lab.0 for ; Fri, 04 Jul 2014 00:29:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Jb2o+7MfKUouS8fqdNz+pDw0MDxQgNhrskDMS//qtg0=; b=tJDht6gPFIyR9j1WYLpNNu3yQELRfuIfFFlKAd0j0DWoGQ6uRcO0pQqoAIf/DAD76e nE2RduX7aDP1jBZ8kIokJpfYAJhKjWSOooXwIuHF+MeqlU4X/CZNW+dQJVXRX1LfQV9z StxppWxDevN1FeZWBRStQeN4wEk3DH7F/1B1Mwk8uLHfEymT/T8SJXBxjt8yTRCH+M31 A+Cr0J8nYyMAMVvq+m2SY03NHEPIVqXZip1RPbrVA32m5tpQ6OSVldMZmj29DGrUZkUh z+0CJmDcGIfJHO5VhnC0IeApzPbxTNx7bEklb2SSZbai/iGf5bnhdaJzVEj5BrkgaMMk xXgQ== MIME-Version: 1.0 X-Received: by 10.152.36.225 with SMTP id t1mr531302laj.65.1404458971362; Fri, 04 Jul 2014 00:29:31 -0700 (PDT) Received: by 10.112.12.227 with HTTP; Fri, 4 Jul 2014 00:29:31 -0700 (PDT) Date: Fri, 4 Jul 2014 12:59:31 +0530 Message-ID: Subject: In progress edit log from last run not being played in case of a cluster (HA) restart From: Nitin Goyal To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0158b5d416fcea04fd59164a X-Virus-Checked: Checked by ClamAV on apache.org --089e0158b5d416fcea04fd59164a Content-Type: text/plain; charset=UTF-8 Hi All, I am running Hadoop 2.4.0. I am trying to restart my HA cluster but since there isn't a way to gracefully shutdown the NN (AFAIK), I am running into a (sort of) race condition. A client has issued a delete command and NN successfully deletes the requested file (in-progress edit logs across NN & JNs are updated and DN physically delete the blocks). But before the current in-progress edit log segment can be closed, the NN is stopped. Now when the NN is started again, it reads all edit logs from JNs but it does not consider the last in-progress edit log from the last run. Due to this NN is expecting more blocks to be reported than what the DNs have. Unfortunately sometimes this difference can be large enough (considering dfs.namenode.safemode.threshold-pct) to leave the NN in safemode forever. This problem is looks to be generic to me. Can someone please confirm if this is indeed a bug or point out where I may be wrong (either in my process or understanding). I modified the NN code to also read the in-progress edit log from JNs and my problem was resolved. But I am not sure what implications this might have. Here is the code change I did: diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImag index e78153f..b864ec1 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java @@ -623,7 +623,7 @@ private boolean loadFSImage(FSNamesystem target, StartupOption startOpt, } editStreams = editLog.selectInputStreams( imageFiles.get(0).getCheckpointTxId() + 1, - toAtLeastTxId, recovery, false); + toAtLeastTxId, recovery, true); } else { editStreams = FSImagePreTransactionalStorageInspector .getEditLogStreams(storage); -- Regards Nitin Goyal --089e0158b5d416fcea04fd59164a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi All,

I am running Hadoop 2.4.0. I am= trying to restart my HA cluster but since there isn't a way to gracefu= lly shutdown the NN (AFAIK), I am running into a (sort of) race condition. = A client has issued a delete command and NN successfully deletes the reques= ted file (in-progress edit logs across NN & JNs are updated and DN phys= ically delete the blocks). But before the current in-progress edit log segm= ent can be closed, the NN is stopped. Now when the NN is started again, it = reads all edit logs from JNs but it does not consider the last in-progress = edit log from the last run. Due to this NN is expecting more blocks to be r= eported than what the DNs have. Unfortunately sometimes this difference can= be large enough (considering dfs.namenode.safemode.threshold-pct) to leave= the NN in safemode forever.

This problem is looks to be generic to me. Can someone = please confirm if this is indeed a bug or point out where I may be wrong (e= ither in my process or understanding).


I modified the NN code to also read the in-progress edit log from JNs = and my problem was resolved. But I am not sure what implications this might= have. Here is the code change I did:

diff --= git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/= server/namenode/FSImage.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/jav= a/org/apache/hadoop/hdfs/server/namenode/FSImag
index e78153f..b864ec1 100644
--- a/hadoop-hdfs-project/hado= op-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/had= oop/hdfs/server/namenode/FSImage.java
@@ -623,7 +623,7 @@ private boolean loadFSImage(FSNamesystem target, S= tartupOption startOpt,
=C2=A0 =C2=A0 =C2=A0 =C2=A0}
=C2= =A0 =C2=A0 =C2=A0 =C2=A0editStreams =3D editLog.selectInputStreams(
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0imageFiles.get(0).getCheckpoint= TxId() + 1,
- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0toAtLeastTxId, recovery, false);
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0toAtLeastTxId, recovery, true)= ;
=C2=A0 =C2=A0 =C2=A0} else {
=C2=A0 =C2=A0 =C2=A0 =C2= =A0editStreams =3D FSImagePreTransactionalStorageInspector
=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0.getEditLogStreams(storage);

--
Regards
Nitin Goyal
--089e0158b5d416fcea04fd59164a--