hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Harvey <dan.har...@mendeley.com>
Subject HDFS-909 name node edit log corruption
Date Thu, 02 Dec 2010 11:20:26 GMT
Hey,

Yesterday we restarted our Name Node for the first time in awhile to push
out some new configuration updates to it. Upon it starting again we got this
error :-

2010-12-01 10:59:39,635 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 121229
2010-12-01 10:59:41,578 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 126
2010-12-01 10:59:41,598 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 19581054 loaded in 1 seconds.
2010-12-01 10:59:41,600 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NullPointerException
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1073)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1085)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:992)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:195)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:615)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:999)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:312)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:293)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:224)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:306)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1004)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1013)

Which I tracked down to being a race error with the edit log saving :-
https://issues.apache.org/jira/browse/HDFS-909

We fixed this by applying the patch from here
https://issues.apache.org/jira/browse/HDFS-1002 which meant we could start
the name node and let it fix the edit log, but meant we lost some files from
HDFS..

We're using CHD2-169.68, and this bug was fixed in CHD2-169.113 released in
September so I would recommend everyone upgrades to that!

Thanks,

-- 
Dan Harvey | Datamining Engineer
www.mendeley.com/profiles/dan-harvey

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Mime
View raw message