Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1CCA9109D6 for ; Wed, 5 Jun 2013 08:09:55 +0000 (UTC) Received: (qmail 70527 invoked by uid 500); 5 Jun 2013 08:09:49 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 70184 invoked by uid 500); 5 Jun 2013 08:09:48 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Delivered-To: moderator for user@hadoop.apache.org Received: (qmail 30414 invoked by uid 99); 5 Jun 2013 07:41:50 -0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 749779.96949.bm@omp1018.mail.ne1.yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1370418082; bh=r0pLnxqOispFf2GdIPu1Xc09cR9dkvKajUJA/oe/6wg=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=6KmYLqwGBM+FJby8g9P2Kn5ue+nHkGjyA7cwFe+M2sSQaU0r5E0rlk8LlFFlpblv3TilhHGNqtj2IMkc/OC9QM/U226dku3/xnLXeKz5rXSxWrZQnorU01lKk9cnCNAQ2/b3Z8/IecE2lbBIYurSDIUlOp9DF0b5n7mmQpXimFk= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=lRKA+mhdMqGjYwZevZt6Q7pOqf7X3PBC0disPZxYc6GLfnGQz/W57S0MnVcC1fLnY67zuFHgUkKTRFeveK1Kb8rzxrD1V1ZL9+W8XA5KD3IGb8HUwGlsqn2pCRX8vatUnj9B1RjGLS1Gy2m7Y1U/AlRUzQQii9LxyYaYlBPXkdo=; X-YMail-OSG: f_AWWHsVM1lxxS6ZTA.9XSuqNlJSM3fNY01h25So8TKNxea BD6FDQJeXAtEAZE0tELZV_wDiq0oY2b.ygYwBe.IyTOZRo8_.dWddu.wqWXp 7m6yOWSrC3tSybeBLYVSYIcilp3piNsQfsTKZsSsOJABu1nQ.UpnqktzlMgS Jm.7D8Q4eh3gxDk9qQjvnupNEHhI7a1nMTHhx9jkT8hCEj0fhNJbQO9.cSgQ 4qBTDaGOI55zjPtto9DYU.kiLwofkUM_CgZKHRo41a_24_nY07gCpWAosBI1 cc_FCAcrCO0yhkCjJujTd9P5HgncLDXwToEeYjPcE0jfWu5bGC.G1QgoY.FC judrPpi78TmMRgXD9SOU9L4knKW2J.OVWqqavNBU0_S4Ou8nRAlxaG1UHlWC NilsNB_0w3_w3lmp2EVHtL2KSKyASv7iQz0doRP0dc5xY4OWcgy_lNzUdRUL DtgarAYNgucXwkQ7TxalqKvsFh3HQUBeyRDmDq6F7i_cdpqzSQ3_WXOC_hsg YjsrbzQ2PU1z2ZJCSeS2mDDMJZkXesoHYJvb8PDz2ALwsGPJVOlzowdj63RE TOjDezfWxbkkARCT5ECLgUJqe2UBF128PE39T.TCTjwZ6hfOIFPk1_rm6dYg IyVQQbS9xYnA_VSe8OyNq16e8SizO95K6jTnJyLDzMMtT8UI- X-Rocket-MIMEInfo: 002.001,SXMgaXQgYW4gb3BlcmF0aW9uIGVycm9yIG9uIHVwZ3JhZGUgc2luY2UgdGhlIGVkaXQgaXMgbm9uLWVtcHR5P8KgIFRoZSAKb3JpZ2luYWwgaW1hZ2UgYW5kIGVkaXQgc2hvdWxkIGJlIHN0aWxsIGF2YWlsYWJsZS7CoCBJZiBpdCBpcyB0aGUgY2FzZSwgSSBzdWdnZXN0IHRvIHN0YXJ0IE5OIHdpdGggMS4wLjQgc28gdGhhdCB0aGUgZWRpdCBiZWNvbWVzIGVtcHR5LCBhbmQgdGhlbiB0cnkgdXBncmFkZSBhZ2Fpbi4KCgo.IFJlY2VudCBvcGNvZGUgb2Zmc2V0czogNSAxNAoKQlRXLCBvcGNvZGUgNSBpcyBPUF8BMAEBAQE- X-Mailer: YahooMailWebService/0.8.145.547 References: Message-ID: <1370418081.59277.YahooMailNeo@web125702.mail.ne1.yahoo.com> Date: Wed, 5 Jun 2013 00:41:21 -0700 (PDT) From: Tsz Wo Sze Reply-To: Tsz Wo Sze Subject: Re: HDFS edit log NPE To: "user@hadoop.apache.org" , "rdyer@iastate.edu" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-1516101697-1434050762-1370418081=:59277" X-Virus-Checked: Checked by ClamAV on apache.org ---1516101697-1434050762-1370418081=:59277 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Is it an operation error on upgrade since the edit is non-empty?=A0 The =0A= original image and edit should be still available.=A0 If it is the case, I = suggest to start NN with 1.0.4 so that the edit becomes empty, and then try= upgrade again.=0A=0A=0A> Recent opcode offsets: 5 14=0A=0ABTW, opcode 5 is= OP_DATANODE_ADD which was deprecated long time ago.=A0 It =0Aseems that v1= .1.2 cannot understand v1.0.4 edit.=A0 Otherwise, the =0Aedit log is corrup= ted.=0A=0AHope it helps.=0ATsz-Wo=0A=0A=0A=0A=0A___________________________= _____=0A From: Robert Dyer =0ATo: "user@hadoop.apache.or= g" =0ASent: Tuesday, June 4, 2013 2:12 PM=0ASubjec= t: HDFS edit log NPE=0A =0A=0A=0AI recently upgraded from 1.0.4 to 1.1.2. = =A0Now however my HDFS won't start up. =A0There appears to be something wro= ng in the edits file.=0A=0AObviously I can roll back to a previous checkpoi= nt, however it appears checkpointing has been failing for some time and my = last check point is over a month old.=0A=0AIs there a way to manually edit/= inspect the edits file in 1.1.2 so I can fix this? =A0What is causing this = bug?=0A=0A-------------------------------------------=0A=0A2013-06-04 01:07= :15,952 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = =3D 1111=0A2013-06-04 01:07:16,071 INFO org.apache.hadoop.hdfs.server.commo= n.Storage: Number of files under construction =3D 7=0A2013-06-04 01:07:16,0= 73 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 27= 0269 loaded in 0 seconds.=0A2013-06-04 01:07:16,075 ERROR org.apache.hadoop= .hdfs.server.common.Storage: Error replaying edit log at offset 132=0ARecen= t opcode offsets: 5 14=0Ajava.lang.NullPointerException=0A=A0 =A0 =A0 =A0 a= t org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.j= ava:1124)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.FSDir= ectory.addChild(FSDirectory.java:1136)=0A=A0 =A0 =A0 =A0 at org.apache.hado= op.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1021)= =0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.un= protectedMkdir(FSDirectory.java:1008)=0A=A0 =A0 =A0 =A0 at org.apache.hadoo= p.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:756)=0A=A0 =A0 = =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSIma= ge.java:1025)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.F= SImage.loadFSImage(FSImage.java:841)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop= .hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)=0A=A0= =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSIm= age(FSDirectory.java:100)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.serve= r.namenode.FSNamesystem.initialize(FSNamesystem.java:411)=0A=A0 =A0 =A0 =A0= at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem= .java:379)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.Name= Node.initialize(NameNode.java:284)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.h= dfs.server.namenode.NameNode.(NameNode.java:536)=0A=A0 =A0 =A0 =A0 at= org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.ja= va:1410)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.NameNo= de.main(NameNode.java:1419)=0A2013-06-04 01:07:16,077 ERROR org.apache.hado= op.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.= =0Ajava.io.IOException: Error replaying edit log at offset 132=0ARecent opc= ode offsets: 5 14=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.nameno= de.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:84)=0A= =A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSE= dits(FSEditLog.java:929)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server= .namenode.FSImage.loadFSEdits(FSImage.java:1025)=0A=A0 =A0 =A0 =A0 at org.a= pache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)=0A= =A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTr= ansitionRead(FSImage.java:377)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.= server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)=0A=A0 =A0 =A0= =A0 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNa= mesystem.java:411)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namen= ode.FSNamesystem.(FSNamesystem.java:379)=0A=A0 =A0 =A0 =A0 at org.apa= che.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)=0A= =A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.NameNode.(N= ameNode.java:536)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.nameno= de.NameNode.createNameNode(NameNode.java:1410)=0A=A0 =A0 =A0 =A0 at org.apa= che.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)=0A2013-06= -04 01:07:16,078 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: jav= a.io.IOException: Error replaying edit log at offset 132=0ARecent opcode of= fsets: 5 14=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.Met= aRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:84)=0A=A0 =A0= =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FS= EditLog.java:929)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.nameno= de.FSImage.loadFSEdits(FSImage.java:1025)=0A=A0 =A0 =A0 =A0 at org.apache.h= adoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)=0A=A0 =A0 = =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransition= Read(FSImage.java:377)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.n= amenode.FSDirectory.loadFSImage(FSDirectory.java:100)=0A=A0 =A0 =A0 =A0 at = org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem= .java:411)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.FSNa= mesystem.(FSNamesystem.java:379)=0A=A0 =A0 =A0 =A0 at org.apache.hado= op.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)=0A=A0 =A0 = =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.= java:536)=0A=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.NameN= ode.createNameNode(NameNode.java:1410)=0A=A0 =A0 =A0 =A0 at org.apache.hado= op.hdfs.server.namenode.NameNode.main(NameNode.java:1419)=0A=0A2013-06-04 0= 1:07:16,078 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_= MSG: ---1516101697-1434050762-1370418081=:59277 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
Is it an operation error on up= grade since the edit is non-empty?  The =0Aoriginal image and edit sho= uld be still available.  If it is the case, I suggest to start NN=0A w= ith 1.0.4 so that the edit becomes empty, and then try upgrade again.
= =0A
> Recent opcode offsets: 5 14

BTW,= =0A opcode 5 is OP_DATANODE_ADD which was deprecated long time ago.  I= t =0Aseems that v1.1.2 cannot understand v1.0.4 edit.  Otherwise, the = =0Aedit log is corrupted.
=
Hope it helps.
Tsz-Wo



From: Robert Dyer <psybers@gmail.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org&g= t;
Sent: Tuesday, Jun= e 4, 2013 2:12 PM
Subject:= HDFS edit log NPE
I recently upgraded from 1.0.4 to 1.1.2.  = ;Now however my HDFS won't start up.  There appears to be something wr= ong in the edits file.

Obvio= usly I can roll back to a previous checkpoint, however it appears checkpoin= ting has been failing for some time and my last check point is over a month= old.
=0A

Is there a way to m= anually edit/inspect the edits file in 1.1.2 so I can fix this?  What = is causing this bug?

-------= ------------------------------------
=0A

2013-06-04 01:07:15,952 INFO org.apache.hadoop.hdfs.server.common.Storage:= Number of files =3D 1111
2013-06-04 01:07:16,071 INFO org.apache= .hadoop.hdfs.server.common.Storage: Number of files under construction =3D = 7
=0A
2013-06-04 01:07:16,073 INFO org.apache.hadoop.hdfs.server.c= ommon.Storage: Image file of size 270269 loaded in 0 seconds.
201= 3-06-04 01:07:16,075 ERROR org.apache.hadoop.hdfs.server.common.Storage: Er= ror replaying edit log at offset 132
=0A
Recent opcode offsets: 5 = 14
java.lang.NullPointerException
      =   at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDir= ectory.java:1124)
        at org.apache.hadoo= p.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1136)
=0A=
        at org.apache.hadoop.hdfs.server.namenode.= FSDirectory.unprotectedMkdir(FSDirectory.java:1021)
   =     at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprot= ectedMkdir(FSDirectory.java:1008)
=0A
        = at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.j= ava:756)
        at org.apache.hadoop.hdfs.se= rver.namenode.FSImage.loadFSEdits(FSImage.java:1025)
   = ;     at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSIma= ge(FSImage.java:841)
=0A
        at org.apache= .hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377= )
        at org.apache.hadoop.hdfs.server.na= menode.FSDirectory.loadFSImage(FSDirectory.java:100)
   = ;     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init= ialize(FSNamesystem.java:411)
=0A
        at o= rg.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesyste= m.java:379)
        at org.apache.hadoop.hdfs= .server.namenode.NameNode.initialize(NameNode.java:284)
  &n= bsp;     at org.apache.hadoop.hdfs.server.namenode.NameNode.<i= nit>(NameNode.java:536)
=0A
        at org.= apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:14= 10)
        at org.apache.hadoop.hdfs.server.= namenode.NameNode.main(NameNode.java:1419)
2013-06-04 01:07:16,07= 7 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem i= nitialization failed.
=0A
java.io.IOException: Error replaying edi= t log at offset 132
Recent opcode offsets: 5 14
  =       at org.apache.hadoop.hdfs.server.namenode.MetaRecovery= Context.editLogLoaderPrompt(MetaRecoveryContext.java:84)
=0A
 = ;       at org.apache.hadoop.hdfs.server.namenode.FSEditLog.= loadFSEdits(FSEditLog.java:929)
        at or= g.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)=
        at org.apache.hadoop.hdfs.server.nam= enode.FSImage.loadFSImage(FSImage.java:841)
=0A
    &nbs= p;   at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransiti= onRead(FSImage.java:377)
        at org.apach= e.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)=
        at org.apache.hadoop.hdfs.server.nam= enode.FSNamesystem.initialize(FSNamesystem.java:411)
=0A
  &n= bsp;     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.&= lt;init>(FSNamesystem.java:379)
        at= org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:2= 84)
        at org.apache.hadoop.hdfs.server.= namenode.NameNode.<init>(NameNode.java:536)
=0A
   = ;     at org.apache.hadoop.hdfs.server.namenode.NameNode.createNa= meNode(NameNode.java:1410)
        at org.apa= che.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
2013-06-04 01:07:16,078 ERROR org.apache.hadoop.hdfs.server.namenode.= NameNode: java.io.IOException: Error replaying edit log at offset 132
= =0A
Recent opcode offsets: 5 14
        a= t org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderP= rompt(MetaRecoveryContext.java:84)
        at= org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.jav= a:929)
=0A
        at org.apache.hadoop.hdfs.s= erver.namenode.FSImage.loadFSEdits(FSImage.java:1025)
  &nbs= p;     at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSIm= age(FSImage.java:841)
        at org.apache.h= adoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)<= /div>=0A
        at org.apache.hadoop.hdfs.server.n= amenode.FSDirectory.loadFSImage(FSDirectory.java:100)
  &nbs= p;     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.ini= tialize(FSNamesystem.java:411)
        at org= .apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.= java:379)
=0A
        at org.apache.hadoop.hdf= s.server.namenode.NameNode.initialize(NameNode.java:284)
  &= nbsp;     at org.apache.hadoop.hdfs.server.namenode.NameNode.<= init>(NameNode.java:536)
        at org.ap= ache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410= )
=0A
        at org.apache.hadoop.hdfs.server= .namenode.NameNode.main(NameNode.java:1419)

2013-0= 6-04 01:07:16,078 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHU= TDOWN_MSG:
=0A


<= /div> ---1516101697-1434050762-1370418081=:59277--