Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of psybers@gmail.com designates
 209.85.213.45 as permitted sender)
MIME-Version: 1.0
Reply-To: rdyer@iastate.edu
In-Reply-To: 
 <CAD=WJAi35S3zGjr03ZwK2vh6DVu_LeqwwwbD_s9F=OZGsRo3iw@mail.gmail.com>
References: 
 <CAD=WJAjC1ynhNAhdZH=Uy4Mg=sVwwcMOkrceoWPmMeXSAsuVdA@mail.gmail.com>
	<CAOcnVr2xBRx9Q=u1WzP1VyzmjAkjjzcJdkmS8R-sXDZNo8aUiw@mail.gmail.com>
	<CAD=WJAikK7OVa0ZCLPTuqKsaoYM=ai=m6r_W6D3eo5RgZKZ3bA@mail.gmail.com>
	<CAORpBsiJ37=LBSGzMzhDmJmsEfzoj5YYp6FZ2Cbp1H49LU8WiQ@mail.gmail.com>
	<CAD=WJAi7BNqxSPWDnr0JU+YczkaTP8B49WYHVsGudY=NV1Ykeg@mail.gmail.com>
	<CAMVC6RPCMJtPC0pEOGWUNbLGAwF-yZrikEJokzTQBs_GtUvsTQ@mail.gmail.com>
	<CAD=WJAi35S3zGjr03ZwK2vh6DVu_LeqwwwbD_s9F=OZGsRo3iw@mail.gmail.com>
Date: Wed, 26 Dec 2012 08:17:15 -0600
Message-ID: 
 <CAMiz3FPk+MiKuE8Fgn6VeqydgdjsKUct9v0OqjkRkE5KtJZigA@mail.gmail.com>
Subject: Re: why not hadoop backup name node data to local disk daily or
 hourly?
From: Robert Dyer <psybers@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=20cf303f6708618ba704d1c216ab

--20cf303f6708618ba704d1c216ab
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I actually have this exact same error.  After running my namenode for
awhile (with a snn), it gets to a point where the snn starts crashing and
if I try to restart the NN I will get this problem.  I typically wind up
having to go with a much older copy of the image and edits files in order
to get it up and running and naturally that means data loss.

On Mon, Dec 24, 2012 at 8:22 PM, =E5=91=A8=E6=A2=A6=E6=83=B3 <ablozhou@gmai=
l.com> wrote:

> thanks Tariq,
> Now we are trying to recover data=EF=BC=8Cbut some data has lost forever.
>
> the logs just reported NULL Point Exception:
>
> 2012-12-17 17:09:05,646 ERROR org.apache.hadoop.hdfs.server.namenode.Name=
Node: java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FS=
Directory.java:1094)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FS=
Directory.java:1106)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSD=
irectory.java:1009)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotected=
AddFile(FSDirectory.java:208)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(F=
SEditLog.java:626)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSI=
mage.java:1015)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSI=
mage.java:833)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransiti=
onRead(FSImage.java:372)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage=
(FSDirectory.java:100)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize=
(FSNamesystem.java:388)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesys=
tem.java:362)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(Nam=
eNode.java:276)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java=
:496)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode=
(NameNode.java:1279)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.=
java:1288)
>
> We changed the source of hadoop to try catch this exception and rebuild
> it, then we can start hadoop NN, but the problem of HBase remained.
> so we have to upgrade the version of HBase and try to repair HBase Meta
> data from Regins data.
> Now we are planning to upgrade to stable version of hadoop 1.0.4 and HBas=
e
> 0.94.3.
>
> Best regards,
> Andy
>
> 2012/12/24 Mohammad Tariq <dontariq@gmail.com>
>
>> Hello Andy,
>>
>>      I hope you are stable now :)
>>
>> Just a quick question. Did you find anything interesting in the NN, SNN,
>> DN logs?
>>
>> And my grandma says, I look like Abhishek Bachchcan<http://en.wikipedia.=
org/wiki/Abhishek_Bacchan>;)
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Mon, Dec 24, 2012 at 4:24 PM, =E5=91=A8=E6=A2=A6=E6=83=B3 <ablozhou@g=
mail.com> wrote:
>>
>>> I stoped the Hadoop, changed every nodes' IP and configured again, and
>>> started Hadoop again. Yes, we did change the IP of NN.
>>>
>>>
>>> 2012/12/24 Nitin Pawar <nitinpawar432@gmail.com>
>>>
>>>> what do you mean by this "We changed all IPs of the Hadoop System"
>>>>
>>>> You changed the IPs of the nodes in one go? or you retired nodes one b=
y
>>>> one and changed IPs and brought them back in rotation? Also did you ch=
ange
>>>> IP of your NN as well ?
>>>>
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 4:10 PM, =E5=91=A8=E6=A2=A6=E6=83=B3 <ablozhou=
@gmail.com> wrote:
>>>>
>>>>> Actually the problem was beggining at SecondNameNode. We changed all
>>>>> IPs of the Hadoop System
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>

--20cf303f6708618ba704d1c216ab
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I actually have this exact same error. =C2=A0After running=
 my namenode for awhile (with a snn), it gets to a point where the snn star=
ts crashing and if I try to restart the NN I will get this problem. =C2=A0I=
 typically wind up having to go with a much older copy of the image and edi=
ts files in order to get it up and running and naturally that means data lo=
ss.<div class=3D"gmail_extra">
<br><div class=3D"gmail_quote">On Mon, Dec 24, 2012 at 8:22 PM, =E5=91=A8=
=E6=A2=A6=E6=83=B3 <span dir=3D"ltr">&lt;<a href=3D"mailto:ablozhou@gmail.c=
om" target=3D"_blank">ablozhou@gmail.com</a>&gt;</span> wrote:<br><blockquo=
te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so=
lid;padding-left:1ex">
<div>thanks Tariq,</div>Now we are trying to recover data=EF=BC=8Cbut some =
data has lost forever.<div><br></div><div>the logs just reported NULL Point=
 Exception:</div><div><pre style=3D"padding:0.8em;font-family:&#39;Courier =
New&#39;,Courier,&#39;Lucida Console&#39;,Monaco,&#39;DejaVu Sans Mono&#39;=
,&#39;Nimbus Mono L&#39;,&#39;Bitstream Vera Sans Mono&#39;;overflow:hidden=
;white-space:pre-wrap;font-size:12px;background-color:rgb(238,238,238);widt=
h:837.2666625976563px;border:1px solid rgb(153,153,153)">
2012-12-17 17:09:05,646 ERROR org.apache.hadoop.hdfs.server.namenode.NameNo=
de: java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDi=
rectory.java:1094)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDi=
rectory.java:1106)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDir=
ectory.java:1009)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAd=
dFile(FSDirectory.java:208)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSE=
ditLog.java:626)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSIma=
ge.java:1015)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSIma=
ge.java:833)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransition=
Read(FSImage.java:372)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(F=
SDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(F=
SNamesystem.java:388)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesyste=
m.java:362)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameN=
ode.java:276)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:4=
96)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(N=
ameNode.java:1279)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.ja=
va:1288)</pre>We changed the source of hadoop to try catch this exception a=
nd rebuild it, then we can start hadoop NN, but the problem of HBase remain=
ed.</div>


<div>so we have to upgrade the version of HBase and try to repair HBase Met=
a data from Regins data.</div><div>Now we are planning to upgrade to stable=
 version of hadoop 1.0.4 and HBase 0.94.3.</div><div><br></div><div>Best re=
gards,</div>


<div>Andy</div><div class=3D"HOEnZb"><div class=3D"h5">=C2=A0<br><div class=
=3D"gmail_quote">2012/12/24 Mohammad Tariq <span dir=3D"ltr">&lt;<a href=3D=
"mailto:dontariq@gmail.com" target=3D"_blank">dontariq@gmail.com</a>&gt;</s=
pan><br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border=
-left:1px #ccc solid;padding-left:1ex">


<div dir=3D"ltr">Hello Andy,<div><br></div><div>=C2=A0 =C2=A0 =C2=A0I hope =
you are stable now :)</div><div><br></div><div>Just a quick question. Did y=
ou find anything interesting in the NN, SNN, DN logs?</div><div><br>

</div><div>And my grandma says, I look like <a href=3D"http://en.wikipedia.=
org/wiki/Abhishek_Bacchan" target=3D"_blank">Abhishek Bachchcan</a> ;)</div=
></div><div class=3D"gmail_extra"><div><br clear=3D"all"><div><div dir=3D"l=
tr">

Best Regards,<div>

Tariq</div><div><a href=3D"tel:%2B91-9741563634" value=3D"+919741563634" ta=
rget=3D"_blank">+91-9741563634</a></div><div><a href=3D"https://mtariq.jux.=
com/" target=3D"_blank">https://mtariq.jux.com/</a><br></div></div></div>
<br><br></div><div><div class=3D"gmail_quote">On Mon, Dec 24, 2012 at 4:24 =
PM, =E5=91=A8=E6=A2=A6=E6=83=B3 <span dir=3D"ltr">&lt;<a href=3D"mailto:abl=
ozhou@gmail.com" target=3D"_blank">ablozhou@gmail.com</a>&gt;</span> wrote:=
<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef=
t:1px #ccc solid;padding-left:1ex">


I stoped the Hadoop, changed every nodes&#39; IP and configured again, and =
started Hadoop again. Yes, we did change the IP of NN. =C2=A0<div><br><br><=
div class=3D"gmail_quote">2012/12/24 Nitin Pawar <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:nitinpawar432@gmail.com" target=3D"_blank">nitinpawar432@gmai=
l.com</a>&gt;</span><br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra">=
what do you mean by this &quot;<span style=3D"font-family:arial,sans-serif;=
font-size:13px">We changed all IPs of the Hadoop System&quot;=C2=A0</span><=
/div>


<div class=3D"gmail_extra"><font face=3D"arial, sans-serif"><br>
</font></div><div class=3D"gmail_extra"><font face=3D"arial, sans-serif">Yo=
u changed the IPs of the nodes in one go? or you retired nodes one by one a=
nd changed IPs and brought them back in rotation? Also did you change IP of=
 your NN as well ?=C2=A0</font></div>


<div class=3D"gmail_extra"><font face=3D"arial, sans-serif"><br></font></di=
v><div class=3D"gmail_extra"><div><font face=3D"arial, sans-serif">=C2=A0<b=
r></font><br><div class=3D"gmail_quote">On Mon, Dec 24, 2012 at 4:10 PM, =
=E5=91=A8=E6=A2=A6=E6=83=B3 <span dir=3D"ltr">&lt;<a href=3D"mailto:ablozho=
u@gmail.com" target=3D"_blank">ablozhou@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">Actually the problem was beggining at SecondNameNode. We c=
hanged all IPs of the Hadoop System</blockquote>


</div><br><br clear=3D"all"><div><br></div></div><span><font color=3D"#8888=
88">-- <br>Nitin Pawar</font></span></div></div></blockquote></div></div></=
blockquote></div></div></div></blockquote></div></div></div></blockquote></=
div>
</div></div>

--20cf303f6708618ba704d1c216ab--