Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 464A296AE for ; Thu, 18 Dec 2014 09:39:58 +0000 (UTC) Received: (qmail 7742 invoked by uid 500); 18 Dec 2014 09:39:40 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 7631 invoked by uid 500); 18 Dec 2014 09:39:40 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 7004 invoked by uid 99); 18 Dec 2014 09:39:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Dec 2014 09:39:39 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [185.24.222.91] (HELO smtp.grend.hu) (185.24.222.91) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Dec 2014 09:39:33 +0000 Received: from [10.0.24.101] (unknown [89.135.153.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.grend.hu (Postfix) with ESMTPSA id 9B9656040C for ; Thu, 18 Dec 2014 10:37:36 +0100 (CET) Message-ID: <5492A06C.1080608@bsdaemon.hu> Date: Thu, 18 Dec 2014 10:37:48 +0100 From: Andras POTOCZKY User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: user@hadoop.apache.org Subject: Re: Name Node HA ERROR References: In-Reply-To: Content-Type: multipart/alternative; boundary="------------030706030908060401020408" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------030706030908060401020408 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi It seems both namenodes was active for a period or the standby node process was stopped for long time. Tip: on the standby node try to backup the fsimage and bootstrap that node again. Be careful because if you do a namenode format again you will lost your datas on the hdfs. "If you have already formatted the NameNode, or are converting a non-HA-enabled cluster to be HA-enabled, you should now copy over the contents of your NameNode metadata directories to the other, unformatted NameNode by running the command "/hdfs namenode -bootstrapStandby/" on the unformatted NameNode. Running this command will also ensure that the JournalNodes (as configured by *dfs.namenode.shared.edits.dir*) contain sufficient edits transactions to be able to start both NameNodes." Anyway here is a link about other namenode recovery possibilities: http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/ Andras On 2014.12.18. 5:11, Sajid Syed wrote: > Hi All, > > I have configured CDH4 with HA. It was working fine for some time and > now I started seeing this error and namenode had failed over to > secondary server. > > > 2014-12-17 08:44:31,847 FATAL > org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode > join > org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error > replaying edit log at offset 0. Expected transaction ID was 1 --------------030706030908060401020408 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
Hi

It seems both namenodes was active for a period or the standby node process was stopped for long time.
Tip: on the standby node try to backup the fsimage and bootstrap that node again. Be careful because if you do a namenode format again you will lost your datas on the hdfs.

"If you have already formatted the NameNode, or are converting a non-HA-enabled cluster to be HA-enabled, you should now copy over the contents of your NameNode metadata directories to the other, unformatted NameNode by running the command "hdfs namenode -bootstrapStandby" on the unformatted NameNode. Running this command will also ensure that the JournalNodes (as configured by dfs.namenode.shared.edits.dir) contain sufficient edits transactions to be able to start both NameNodes."

Anyway here is a link about other namenode recovery possibilities:
http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/

Andras


On 2014.12.18. 5:11, Sajid Syed wrote:
Hi All,

I have configured CDH4 with HA. It was working fine for some time and now I started seeing this error and namenode had failed over to secondary server.


2014-12-17 08:44:31,847 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 0.  Expected transaction ID was 1

--------------030706030908060401020408--