Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 80030 invoked from network); 26 Feb 2010 17:39:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Feb 2010 17:39:53 -0000 Received: (qmail 8382 invoked by uid 500); 26 Feb 2010 17:39:53 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 8355 invoked by uid 500); 26 Feb 2010 17:39:53 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 8347 invoked by uid 99); 26 Feb 2010 17:39:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Feb 2010 17:39:53 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Feb 2010 17:39:53 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3B12729A0021 for ; Fri, 26 Feb 2010 09:39:33 -0800 (PST) Message-ID: <1303562683.559601267205973240.JavaMail.jira@brutus.apache.org> Date: Fri, 26 Feb 2010 17:39:33 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image In-Reply-To: <383576589.1255021831718.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-686?page=3Dcom.atlassian.j= ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D128389= 56#action_12838956 ]=20 Andrew Purtell commented on HDFS-686: ------------------------------------- Hairong, We are using 0.20.=20 We are trying to recover a corrupt volume with about 3 TB of data we'd like= to get back. We have followed the steps on the page http://hadoop.apache.= org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode but are get= ting exceptions that look very similar to the ones posted on this issue.=20 We tried to use the openNEP patch. The result looks the same: 2010-02-26 05:47:17,598 ERROR org.apache.hadoop.hdfs.server.common.Storage:= java.io.IOException: Unexpected block size: -3664558185340993536 2010-02-26 05:47:17,600 ERROR org.apache.hadoop.hdfs.server.namenode.NameNo= de: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDi= rectory.java:1007) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDi= rectory.java:993) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMk= dir(FSDirectory.java:967) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMk= dir(FSDirectory.java:954) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSE= ditLog.java:696) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSIma= ge.java:1001) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSIma= ge.java:812) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransition= Read(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(F= SDirectory.java:88) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(F= SNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNam= esystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameN= ode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.= java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(N= ameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.ja= va:965) It doesn't work.=20 Btw, we tried to skip all of these corrupt record by modifying Block.java. = However, we then saw this: 2010-02-26 03:34:58,627 ERROR org.apache.hadoop.hdfs.server.common.Storage:= java.lang.IllegalArgumentException: No enum const class org.apache.hadoop.= hdfs.protocol.DatanodeInfo$AdminStates.hive_2009_05_31_VSAPI_001_12.VSAPI_0= 01.10637.0.archive.pb$=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD&;=3D=EF=BF=BD=EF= =BF=BD=D9=B4=EF=BF=BD=EF=BF=BD 2010-02-26 03:34:58,629 ERROR org.apache.hadoop.hdfs.server.namenode.NameNo= de: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDi= rectory.java:1007) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDi= rectory.java:993) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMk= dir(FSDirectory.java:967) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMk= dir(FSDirectory.java:954) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSE= ditLog.java:696) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSIma= ge.java:1001) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSIma= ge.java:812) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransition= Read(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(F= SDirectory.java:88) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(F= SNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNam= esystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameN= ode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.= java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(N= ameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.ja= va:965) It looks like some field in the record is not a valid value. Do you have an= y idea?=20 Can we provide you the snapshot to take a look at? We would really like to = recover this volume.=20 =20 > NullPointerException is thrown while merging edit log and image > --------------------------------------------------------------- > > Key: HDFS-686 > URL: https://issues.apache.org/jira/browse/HDFS-686 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.20.1, 0.21.0 > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Priority: Blocker > Attachments: nullSetTime.patch, openNPE-trunk.patch, openNPE.patc= h > > > Our secondary name node is not able to start on NullPointerException: > ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang= .NullPointerException > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotected= SetTimes(FSDirectory.java:1232) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotected= SetTimes(FSDirectory.java:1221) > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(F= SEditLog.java:776) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSI= mage.java:992) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorag= e.doMerge(SecondaryNameNode.java:590) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorag= e.access$000(SecondaryNameNode.java:473) > at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMer= ge(SecondaryNameNode.java:350) > at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doChe= ckpoint(SecondaryNameNode.java:314) > at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(S= econdaryNameNode.java:225) > at java.lang.Thread.run(Thread.java:619) > This was caused by setting access time on a non-existent file. --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.