Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: error (athena.apache.org: local policy)
Message-ID: <5492A06C.1080608@bsdaemon.hu>
Date: Thu, 18 Dec 2014 10:37:48 +0100
From: Andras POTOCZKY <andrej@bsdaemon.hu>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: user@hadoop.apache.org
Subject: Re: Name Node HA ERROR
References: 
 <CAGSm+-uDwz4hVenNDbZB0sdBfj=2vrkZWZ43HQQ=0jtwt4GUYg@mail.gmail.com>
In-Reply-To: 
 <CAGSm+-uDwz4hVenNDbZB0sdBfj=2vrkZWZ43HQQ=0jtwt4GUYg@mail.gmail.com>
Content-Type: multipart/alternative;
 boundary="------------030706030908060401020408"

This is a multi-part message in MIME format.
--------------030706030908060401020408
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Hi

It seems both namenodes was active for a period or the standby node 
process was stopped for long time.
Tip: on the standby node try to backup the fsimage and bootstrap that 
node again. Be careful because if you do a namenode format again you 
will lost your datas on the hdfs.

"If you have already formatted the NameNode, or are converting a 
non-HA-enabled cluster to be HA-enabled, you should now copy over the 
contents of your NameNode metadata directories to the other, unformatted 
NameNode by running the command "/hdfs namenode -bootstrapStandby/" on 
the unformatted NameNode. Running this command will also ensure that the 
JournalNodes (as configured by *dfs.namenode.shared.edits.dir*) contain 
sufficient edits transactions to be able to start both NameNodes."

Anyway here is a link about other namenode recovery possibilities:
http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/

Andras


On 2014.12.18. 5:11, Sajid Syed wrote:
> Hi All,
>
> I have configured CDH4 with HA. It was working fine for some time and 
> now I started seeing this error and namenode had failed over to 
> secondary server.
>
>
> 2014-12-17 08:44:31,847 FATAL 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode 
> join
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error 
> replaying edit log at offset 0.  Expected transaction ID was 1


--------------030706030908060401020408
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Hi<br>
      <br>
      It seems both namenodes was active for a period or the standby
      node process was stopped for long time.<br>
      Tip: on the standby node try to backup the fsimage and bootstrap
      that node again. Be careful because if you do a namenode format
      again you will lost your datas on the hdfs.<br>
      <br>
      "If you have already formatted the NameNode, or are converting a
      non-HA-enabled cluster to be HA-enabled, you should now copy over
      the contents of your NameNode metadata directories to the other,
      unformatted NameNode by running the command "<i>hdfs namenode
        -bootstrapStandby</i>" on the unformatted NameNode. Running this
      command will also ensure that the JournalNodes (as configured by <b>dfs.namenode.shared.edits.dir</b>)
      contain sufficient edits transactions to be able to start both
      NameNodes."<br>
      <br>
      Anyway here is a link about other namenode recovery possibilities:<br>
<a class="moz-txt-link-freetext" href="http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/">http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/</a><br>
      <br>
      Andras<br>
      <br>
      <br>
      On 2014.12.18. 5:11, Sajid Syed wrote:<br>
    </div>
    <blockquote
cite="mid:CAGSm+-uDwz4hVenNDbZB0sdBfj=2vrkZWZ43HQQ=0jtwt4GUYg@mail.gmail.com"
      type="cite">
      <div>Hi All,</div>
      <div><br>
      </div>
      <div>I have configured CDH4 with HA. It was working fine for some
        time and now I started seeing this error and namenode had failed
        over to secondary server.</div>
      <div><br>
      </div>
      <div><br>
      </div>
      <div>2014-12-17 08:44:31,847 FATAL
        org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in
        namenode join</div>
      <div>org.apache.hadoop.hdfs.server.namenode.EditLogInputException:
        Error replaying edit log at offset 0.  Expected transaction ID
        was 1</div>
    </blockquote>
    <br>
  </body>
</html>

--------------030706030908060401020408--