hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Bart <rb...@cs.washington.edu>
Subject NullPointerException on namenode startup
Date Tue, 15 May 2012 06:13:07 GMT
Hi all,

I have a cluster of 9 nodes running Hadoop 1.0.2. I am unable to bring our
the cluster back up after an unsuccessful attempt to restart it. The
problem appears to be caused by a NullPointerException being thrown during
startup of the namenode. The exception seems to be thrown while reading one
of the binary files "edits" or "edits.new" in dfs.name.dir (stack trace
below)

Googling turns up reports of similar problems, and the most common solution
seems to require manually repairing one or both of these files.
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200912.mbox/%3Ca03ecfdb0912160654g4d2f6bd0i73bd690bda55b753@mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201205.mbox/%3C4FA3DFFA.9080507@bnl.gov%3E
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/%3C83E09645-A671-4DCE-89A6-D0E1952190A9@mlogiciels.com%3E

Does this indeed appear to be corruption of the files in dfs.name.dir, or
is it possible that something else is going on? If the former, is manually
repairing the files using a hex editor really the best way to proceed? Any
ideas on how to fix this without losing data would be greatly appreciated!

Here is the relevant part of the namenode log:

************************************************************/
2012-05-14 21:29:05,459 INFO hdfs.server.namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = rv-n11.cs.washington.edu/128.95.1.25
STARTUP_MSG:   args = []
STARTUP_MSG:   vers ion = 1.0.3-SNAPSHOT
STARTUP_MSG:   build =  -r ; compiled by 'knowall' on Tue Apr 17 10:43:14
PDT 2012
************************************************************/
2012-05-14 21:29:05,608 INFO metrics2.impl.MetricsConfig: loaded properties
from hadoop-metrics2.properties
2012-05-14 21:29:05,625 INFO metrics2.impl.MetricsSourceAdapter: MBean for
source MetricsSystem,sub=Stats registered.
2012-05-14 21:29:05,626 INFO metrics2.impl.MetricsSystemImpl: Scheduled
snapshot period at 10 second(s).
2012-05-14 21:29:05,626 INFO metrics2.impl.MetricsSystemImpl: NameNode
metrics system started
2012-05-14 21:29:05,754 INFO metrics2.impl.MetricsSourceAdapter: MBean for
source ugi registered.
2012-05-14 21:29:05,766 INFO metrics2.impl.MetricsSourceAdapter: MBean for
source jvm registered.
2012-05-14 21:29:05,768 INFO metrics2.impl.MetricsSourceAdapter: MBean for
source NameNode registered.
2012-05-14 21:29:05,802 INFO hdfs.util.GSet: VM type       = 64-bit
2012-05-14 21:29:05,802 INFO hdfs.util.GSet: 2% max memory = 17.77875 MB
2012-05-14 21:29:05,803 INFO hdfs.util.GSet: capacity      = 2^21 = 2097152
entries
2012-05-14 21:29:05,803 INFO hdfs.util.GSet: recommended=2097152,
actual=2097152
2012-05-14 21:29:05,825 INFO hdfs.server.namenode.FSNamesystem:
fsOwner=knowall
2012-05-14 21:29:05,825 INFO hdfs.server.namenode.FSNamesystem:
supergroup=supergroup
2012-05-14 21:29:05,825 INFO hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2012-05-14 21:29:05,833 INFO hdfs.server.namenode.FSNamesystem:
dfs.block.invalidate.limit=100
2012-05-14 21:29:05,833 INFO hdfs.server.namenode.FSNamesystem:
isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
accessTokenLifetime=0 min(s)
2012-05-14 21:29:06,005 INFO hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStateMBean and NameNodeMXBean
2012-05-14 21:29:06,032 INFO hdfs.server.namenode.NameNode: Caching file
names occuring more than 10 times
2012-05-14 21:29:06,044 INFO hdfs.server.common.Storage: Number of files =
52617
2012-05-14 21:29:07,319 INFO hdfs.server.common.Storage: Number of files
under construction = 34
2012-05-14 21:29:07,328 INFO hdfs.server.common.Storage: Image file of size
15201146 loaded in 1 seconds.
2012-05-14 21:29:07,334 ERROR hdfs.server.namenode.NameNode:
java.lang.NullPointerException
        at hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1094)
        at hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1106)
        at hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1009)
        at
hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:208)
        at hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:626)
        at hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1015)
        at hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:833)
        at
hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
        at
hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at
hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
        at hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
        at hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
        at hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
        at hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
        at hdfs.server.namenode.NameNode.main(NameNode.java:1288)

2012-05-14 21:29:07,334 INFO hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at rv-n11.cs.washington.edu/128.95.1.25
************************************************************/


-- 
Rob

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message