hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject NN fails to start with LeaseManager errors
Date Tue, 02 Feb 2010 21:54:58 GMT
Hi,

This morning the namenode of my hadoop cluster shut itself down after the
logs/ directory had filled itself with job configs, log files and all the
other fun things hadoop leaves there. It had been running for a few months.
I deleted all off the job configs and attempt log directories and tried to
restart the namenode, but it failed due to many LeaseManager errors.

Does anyone know what needs to be done to fix this and get the namenode back
up?

Here's what the logs report. I'm using Cloudera's 0.18.3 distro.

STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = my-host-name.com/10.15.137.204
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.18.3-2
STARTUP_MSG:   build =  -r ; compiled by 'httpd' on Fri Jun 12 15:27:43 PDT
2009
************************************************************/
2010-02-02 13:38:31,199 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=9000
2010-02-02 13:38:31,208 INFO org.apache.hadoop.dfs.NameNode: Namenode up at:
my-host-name.com/10.15.137.204:9000
2010-02-02 13:38:31,212 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-02-02 13:38:31,218 INFO org.apache.hadoop.dfs.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-02-02 13:38:31,318 INFO org.apache.hadoop.fs.FSNamesystem:
fsOwner=app,app
2010-02-02 13:38:31,319 INFO org.apache.hadoop.fs.FSNamesystem:
supergroup=supergroup
2010-02-02 13:38:31,319 INFO org.apache.hadoop.fs.FSNamesystem:
isPermissionEnabled=true
2010-02-02 13:38:31,329 INFO org.apache.hadoop.dfs.FSNamesystemMetrics:
Initializing FSNamesystemMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-02-02 13:38:31,331 INFO org.apache.hadoop.fs.FSNamesystem: Registered
FSNamesystemStatusMBean
2010-02-02 13:38:31,375 INFO org.apache.hadoop.dfs.Storage: Number of files
= 248675
2010-02-02 13:38:36,932 INFO org.apache.hadoop.dfs.Storage: Number of files
under construction = 2
2010-02-02 13:38:37,008 INFO org.apache.hadoop.dfs.Storage: Image file of
size 42924164 loaded in 5 seconds.
2010-02-02 13:38:37,020 ERROR org.apache.hadoop.dfs.LeaseManager:
/path/on/hdfs/_logs/history/my-host-name.com_1261508934685_job_200912221108_15967_conf.xml
not found in lease.paths
(=[/path/on/hdfs/_logs/history/my-host-name.com_1261508934685_job_200912221108_15967_app_MyJobName_20100202_10_59])

[[ a bunch more errors like the one above ]]

2010-02-02 13:38:37,076 ERROR org.apache.hadoop.fs.FSNamesystem:
FSNamesystem initialization failed.
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:585)
        at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
        at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
        at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
        at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
        at
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
        at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
2010-02-02 13:38:37,077 INFO org.apache.hadoop.ipc.Server: Stopping server
on 9000
2010-02-02 13:38:37,081 ERROR org.apache.hadoop.dfs.NameNode:
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:585)
        at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
        at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
        at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
        at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
        at
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
        at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)

2010-02-02 13:38:37,082 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at my-host-name.com/10.15.137.204
************************************************************/

thanks,
Bill

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message