Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0A042EFEF for ; Wed, 26 Dec 2012 14:17:48 +0000 (UTC) Received: (qmail 44161 invoked by uid 500); 26 Dec 2012 14:17:43 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 43869 invoked by uid 500); 26 Dec 2012 14:17:42 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 43846 invoked by uid 99); 26 Dec 2012 14:17:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Dec 2012 14:17:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of psybers@gmail.com designates 209.85.213.45 as permitted sender) Received: from [209.85.213.45] (HELO mail-yh0-f45.google.com) (209.85.213.45) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Dec 2012 14:17:36 +0000 Received: by mail-yh0-f45.google.com with SMTP id p34so1503587yhp.18 for ; Wed, 26 Dec 2012 06:17:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=5CUJXtzdKork8yMj8XkH5bEXBR11pMc7X649hrg2HT8=; b=cMp//79pPf7RDLdtA8ApG36BG7pMtos/v4oMJ6dQIK8bmcv98sO7ddJ2DPK9vL+qT3 T/k4uh1fRGBDsxBCob0PxWGQ24nSEdd8ClBB71PETJpZ2u7UmRyfsG7WT3f4T525Aa8o /Ww5O+TI7Uvvt1BCC4kj579t+bbV0Jzm7rGtUuw3AeTlLvO/eDfRIZBWBp0XuHHnGs3S yOLU81uheMaMujxmbbF2jsM+bEKRwtnkGtlXjfZmbN7MbzfVfUtxn2elrzdz6I+4QlP4 2i6CJHmYknUysyuLicjBpoCtRUCo6qYcSVi0A65eg7H29lgxhwE2Duq/zprIDOyK6oAR u4qA== MIME-Version: 1.0 Received: by 10.236.190.194 with SMTP id e42mr24813155yhn.28.1356531436194; Wed, 26 Dec 2012 06:17:16 -0800 (PST) Received: by 10.101.109.8 with HTTP; Wed, 26 Dec 2012 06:17:15 -0800 (PST) Reply-To: rdyer@iastate.edu In-Reply-To: References: Date: Wed, 26 Dec 2012 08:17:15 -0600 Message-ID: Subject: Re: why not hadoop backup name node data to local disk daily or hourly? From: Robert Dyer To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf303f6708618ba704d1c216ab X-Virus-Checked: Checked by ClamAV on apache.org --20cf303f6708618ba704d1c216ab Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I actually have this exact same error. After running my namenode for awhile (with a snn), it gets to a point where the snn starts crashing and if I try to restart the NN I will get this problem. I typically wind up having to go with a much older copy of the image and edits files in order to get it up and running and naturally that means data loss. On Mon, Dec 24, 2012 at 8:22 PM, =E5=91=A8=E6=A2=A6=E6=83=B3 wrote: > thanks Tariq, > Now we are trying to recover data=EF=BC=8Cbut some data has lost forever. > > the logs just reported NULL Point Exception: > > 2012-12-17 17:09:05,646 ERROR org.apache.hadoop.hdfs.server.namenode.Name= Node: java.lang.NullPointerException > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FS= Directory.java:1094) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FS= Directory.java:1106) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSD= irectory.java:1009) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotected= AddFile(FSDirectory.java:208) > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(F= SEditLog.java:626) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSI= mage.java:1015) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSI= mage.java:833) > at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransiti= onRead(FSImage.java:372) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage= (FSDirectory.java:100) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize= (FSNamesystem.java:388) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesys= tem.java:362) > at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(Nam= eNode.java:276) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java= :496) > at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode= (NameNode.java:1279) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.= java:1288) > > We changed the source of hadoop to try catch this exception and rebuild > it, then we can start hadoop NN, but the problem of HBase remained. > so we have to upgrade the version of HBase and try to repair HBase Meta > data from Regins data. > Now we are planning to upgrade to stable version of hadoop 1.0.4 and HBas= e > 0.94.3. > > Best regards, > Andy > > 2012/12/24 Mohammad Tariq > >> Hello Andy, >> >> I hope you are stable now :) >> >> Just a quick question. Did you find anything interesting in the NN, SNN, >> DN logs? >> >> And my grandma says, I look like Abhishek Bachchcan;) >> >> Best Regards, >> Tariq >> +91-9741563634 >> https://mtariq.jux.com/ >> >> >> On Mon, Dec 24, 2012 at 4:24 PM, =E5=91=A8=E6=A2=A6=E6=83=B3 wrote: >> >>> I stoped the Hadoop, changed every nodes' IP and configured again, and >>> started Hadoop again. Yes, we did change the IP of NN. >>> >>> >>> 2012/12/24 Nitin Pawar >>> >>>> what do you mean by this "We changed all IPs of the Hadoop System" >>>> >>>> You changed the IPs of the nodes in one go? or you retired nodes one b= y >>>> one and changed IPs and brought them back in rotation? Also did you ch= ange >>>> IP of your NN as well ? >>>> >>>> >>>> >>>> On Mon, Dec 24, 2012 at 4:10 PM, =E5=91=A8=E6=A2=A6=E6=83=B3 wrote: >>>> >>>>> Actually the problem was beggining at SecondNameNode. We changed all >>>>> IPs of the Hadoop System >>>> >>>> >>>> >>>> >>>> -- >>>> Nitin Pawar >>>> >>> --20cf303f6708618ba704d1c216ab Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I actually have this exact same error. =C2=A0After running= my namenode for awhile (with a snn), it gets to a point where the snn star= ts crashing and if I try to restart the NN I will get this problem. =C2=A0I= typically wind up having to go with a much older copy of the image and edi= ts files in order to get it up and running and naturally that means data lo= ss.

On Mon, Dec 24, 2012 at 8:22 PM, =E5=91=A8= =E6=A2=A6=E6=83=B3 <ablozhou@gmail.com> wrote:
thanks Tariq,
Now we are trying to recover data=EF=BC=8Cbut some = data has lost forever.

the logs just reported NULL Point= Exception:
2012-12-17 17:09:05,646 ERROR org.apache.hadoop.hdfs.server.namenode.NameNo=
de: java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDi=
rectory.java:1094)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDi=
rectory.java:1106)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDir=
ectory.java:1009)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAd=
dFile(FSDirectory.java:208)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSE=
ditLog.java:626)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSIma=
ge.java:1015)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSIma=
ge.java:833)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransition=
Read(FSImage.java:372)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(F=
SDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(F=
SNamesystem.java:388)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesyste=
m.java:362)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameN=
ode.java:276)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:4=
96)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(N=
ameNode.java:1279)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.ja=
va:1288)
We changed the source of hadoop to try catch this exception a= nd rebuild it, then we can start hadoop NN, but the problem of HBase remain= ed.
so we have to upgrade the version of HBase and try to repair HBase Met= a data from Regins data.
Now we are planning to upgrade to stable= version of hadoop 1.0.4 and HBase 0.94.3.

Best re= gards,
Andy
=C2=A0
2012/12/24 Mohammad Tariq <dontariq@gmail.com>
Hello Andy,

=C2=A0 =C2=A0 =C2=A0I hope = you are stable now :)

Just a quick question. Did y= ou find anything interesting in the NN, SNN, DN logs?

And my grandma says, I look like Abhishek Bachchcan ;)
On Mon, Dec 24, 2012 at 4:24 = PM, =E5=91=A8=E6=A2=A6=E6=83=B3 <ablozhou@gmail.com> wrote:=
I stoped the Hadoop, changed every nodes' IP and configured again, and = started Hadoop again. Yes, we did change the IP of NN. =C2=A0


<= div class=3D"gmail_quote">2012/12/24 Nitin Pawar <nitinpawar432@gmai= l.com>
= what do you mean by this "We changed all IPs of the Hadoop System"=C2=A0<= /div>

Yo= u changed the IPs of the nodes in one go? or you retired nodes one by one a= nd changed IPs and brought them back in rotation? Also did you change IP of= your NN as well ?=C2=A0

=C2=A0
On Mon, Dec 24, 2012 at 4:10 PM, = =E5=91=A8=E6=A2=A6=E6=83=B3 <ablozhou@gmail.com> wrote:
Actually the problem was beggining at SecondNameNode. We c= hanged all IPs of the Hadoop System



--
Nitin Pawar
--20cf303f6708618ba704d1c216ab--