hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "rongshen.long" <rongshen.l...@baifendian.com>
Subject Backup node crashed with NPE and failed to restart
Date Mon, 22 Oct 2012 14:55:35 GMT
hi,
I tried to run a backup node on hdfs 0.21 , however the daemon crashed with NPE (stack trace
as below) and 
left an 'edits.new' file in the $dfs.namenode.name.dir/current diretory . After that , I failed
to restart the namenode and the backup node because of the same exception. 
Could anyone give me a help to recovery the cluster?  Although the NN can be restarted by
creating an empty 'edits' file ,much data would be lost .

12/10/09 15:32:45 ERROR namenode.Checkpointer: Throwable Exception in doCheckpoint: 
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1765)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1753)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:708)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:378)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1209)
        at org.apache.hadoop.hdfs.server.namenode.BackupStorage.loadCheckpoint(BackupStorage.java:158)
        at org.apache.hadoop.hdfs.server.namenode.Checkpointer.doCheckpoint(Checkpointer.java:243)
        at org.apache.hadoop.hdfs.server.namenode.Checkpointer.run(Checkpointer.java:141)
12/10/09 15:32:45 WARN namenode.FSNamesystem: ReplicationMonitor thread received InterruptedException.java.lang.InterruptedException:
sleep interrupted
12/10/09 15:32:45 WARN namenode.DecommissionManager: Monitor interrupted: java.lang.InterruptedException:
sleep interrupted
12/10/09 15:32:45 INFO namenode.FSNamesystem: Number of transactions: 24 Total time for transactions(ms):
4Number of transactions batched in Syncs: 0 Number of syncs: 25 SyncTimes(ms): 239 
12/10/09 15:32:45 INFO ipc.Server: Stopping server on 50100




2012-10-22



rongshen.long
Mime
View raw message