hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Re: NameNode corruption: NPE addChild at start up
Date Mon, 24 Oct 2011 20:24:40 GMT
Log output prior to first shutdown:

Many StateChange lines:
..
2011-10-24 10:09:56,665 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 
178.21.118.71:50010 to delete  blk_-6474991874786417864_1
10366 blk_4835119182915024496_110378 blk_5400176265071942037_110378 
blk_3844651748475104977_110378 blk_-1608821730784127393_110376 bl
k_6509383172138875668_110378 blk_5451246938053636538_110378 
blk_2897026915704967576_110378 blk_-8171354591956681743_110358 blk_831230
8457484333010_110378 blk_-3230829712632219383_110378
2011-10-24 10:09:56,665 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 
178.21.118.127:50010 to delete  blk_-3399604157806337594_
110378 blk_6747099268224653868_110330 blk_-8319971768460712343_110366 
blk_-2961731871717799971_110378 blk_7869501598834020846_110374 
blk_8093290740979129346_110370 blk_-6013635848766803390_110378 
blk_7993259237237421138_110378 blk_5759478854526777476_110378 blk_-249
5357027905458427_110366 blk_-1854083208115601228_110366
2011-10-24 10:09:56,665 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 
178.21.118.106:50010 to delete  blk_-7802030530595759204_
110378 blk_8721277224606661791_110374 blk_6747099268224653868_110330 
blk_5656147162240510324_110378 blk_2897026915704967576_110378 bl
k_-240991845576310137_110378 blk_5938453705724038271_110378 
blk_8312308457484333010_110378 blk_-1854083208115601228_110366 blk_469356
3029097482715_110378 blk_2642639928334327569_110374 
blk_-1087530163655362136_110378 blk_-330417000415464230_110378 
blk_54512469380536
36538_110378 blk_6300864378343690857_110378 blk_9188777915480180330_110378 
blk_6198541037144409710_110378
2011-10-24 10:09:56,665 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 
178.21.118.102:50010 to delete  blk_-8874651062228528404_
110378 blk_-3399604157806337594_110378 blk_5374073877837004061_110338 
blk_-3734465814030860048_110370 blk_4355408303621436967_110378 
blk_4924117654799449604_110378 blk_1289214837876543883_110378 
blk_-240991845576310137_110378 blk_5938453705724038271_110378 blk_-8294
59405706536971_110370 blk_-7851154197821762387_110322 
blk_2792433696712651711_110378 blk_-361468857299887395_110378 
blk_5345112692694
71506_110342 blk_-7018473240249214767_110378 blk_5393329649654239848_110370
2011-10-24 10:10:12,065 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 
1281 Total time for transac
tions(ms): 21Number of transactions batched in Syncs: 10 Number of syncs: 870 
SyncTimes(ms): 13578 
2011-10-24 10:10:26,250 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
SHUTDOWN_MSG
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at namenode01.openindex.io/178.21.118.98
************************************************************/


Following startup:

..
2011-10-24 10:11:58,601 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), 
accessTokenLifetime=0 min(s)
2011-10-24 10:12:00,478 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered 
FSNamesystemStateMBean and NameNodeMXBean
2011-10-24 10:12:00,592 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
Caching file names occuring more than 10 times 
2011-10-24 10:12:00,609 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Number of files = 58901
2011-10-24 10:12:17,101 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Number of files under construction = 1
2011-10-24 10:12:17,373 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Image file of size 8688539 loaded in 16 seconds.
2011-10-24 10:12:17,376 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: 
java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1113)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1125)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1028)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:205)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:613)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1009)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:827)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:365)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:97)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:379)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:353)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:254)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:434)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162)


> Hi,
> 
> We are on Hadoop 0.20.203 and use HDFS and MapRed for crawling using Apache
> Nutch. This morning we had our first encounter with an unwilling NameNode
> due to a very similar issue descriped on the list earlier [1].
> 
> We replaced with a backed up checkpoint, restarted the daemon and moved
> files with missing blocks to lost+found. Only transient data was affected.
> 
> Just now it happened again, same remedy and same results but it occured to
> me that there may be a reason besides coincidence or bad luck. After
> processing several GB's of prepared data we delete (-skipTrash) what we
> don't need and hurry on.
> 
> Prior to both corruptions a several GB's were deleted and HDFS stopped only
> seconds later. Is there a possible relation here? It's not something i can
> test because we are not willing to risk corruption of important data; we're
> moving it to other locations as we speak.
> 
> Any tips on how to prevent this from happening again or finding the source
> of the problem are very much appreciated.
> 
> [1]:
> http://hadoop-common.472056.n3.nabble.com/addChild-NullPointerException-
> when-starting-namenode-and-reading-edits-file-td92226.html
> 
> Thanks

Mime
View raw message