hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image
Date Fri, 26 Feb 2010 17:39:33 GMT

    [ https://issues.apache.org/jira/browse/HDFS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838956#action_12838956
] 

Andrew Purtell commented on HDFS-686:
-------------------------------------

Hairong,

We are using 0.20. 

We are trying to recover a corrupt volume with about 3 TB of data we'd like to get back. We
have followed the steps on the page  http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode
but are getting exceptions that look very similar to the ones posted on this issue. 

We tried to use the openNEP patch. The result looks the same:

2010-02-26 05:47:17,598 ERROR org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException:
Unexpected block size: -3664558185340993536
2010-02-26 05:47:17,600 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1007)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:993)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:967)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:954)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:696)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1001)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) It doesn't
work. 

Btw, we tried to skip all of these corrupt record by modifying Block.java. However, we then
saw this:

2010-02-26 03:34:58,627 ERROR org.apache.hadoop.hdfs.server.common.Storage: java.lang.IllegalArgumentException:
No enum const class org.apache.hadoop.hdfs.protocol.DatanodeInfo$AdminStates.hive_2009_05_31_VSAPI_001_12.VSAPI_001.10637.0.archive.pb$����&;=��ٴ��
2010-02-26 03:34:58,629 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1007)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:993)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:967)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:954)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:696)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1001)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

It looks like some field in the record is not a valid value. Do you have any idea? 

Can we provide you the snapshot to take a look at? We would really like to recover this volume.


 



> NullPointerException is thrown while merging edit log and image
> ---------------------------------------------------------------
>
>                 Key: HDFS-686
>                 URL: https://issues.apache.org/jira/browse/HDFS-686
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>         Attachments: nullSetTime.patch, openNPE-trunk.patch, openNPE.patch
>
>
> Our secondary name node is not able to start on NullPointerException:
> ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1232)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1221)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
>         at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:590)
>         at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
>         at java.lang.Thread.run(Thread.java:619)
> This was caused by setting access time on a non-existent file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message