hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: NN fails to start with LeaseManager errors
Date Tue, 02 Feb 2010 23:29:34 GMT
I was able to fix this by restoring my namenode from the last checkpoint of
the secondary namenode. Searching the list I saw others have struggled with
this issue so I'll share my steps.

I did it by following Tom White's excellent instructions in Hadoop - The
Definitive Guide:

1. Stop the secondary name node. (Namenode was already stopped)
2. Moved my namenode dir (configured as dfs.name.dir) aside.
3. Started the namenode with the -importCheckpoint option like so:

bin/hadoop-daemon.sh start namenode -importCheckpoint



On Tue, Feb 2, 2010 at 1:54 PM, Bill Graham <billgraham@gmail.com> wrote:

> Hi,
>
> This morning the namenode of my hadoop cluster shut itself down after the
> logs/ directory had filled itself with job configs, log files and all the
> other fun things hadoop leaves there. It had been running for a few months.
> I deleted all off the job configs and attempt log directories and tried to
> restart the namenode, but it failed due to many LeaseManager errors.
>
> Does anyone know what needs to be done to fix this and get the namenode
> back up?
>
> Here's what the logs report. I'm using Cloudera's 0.18.3 distro.
>
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = my-host-name.com/10.15.137.204
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.18.3-2
> STARTUP_MSG:   build =  -r ; compiled by 'httpd' on Fri Jun 12 15:27:43 PDT
> 2009
> ************************************************************/
> 2010-02-02 13:38:31,199 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=NameNode, port=9000
> 2010-02-02 13:38:31,208 INFO org.apache.hadoop.dfs.NameNode: Namenode up
> at: my-host-name.com/10.15.137.204:9000
> 2010-02-02 13:38:31,212 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2010-02-02 13:38:31,218 INFO org.apache.hadoop.dfs.NameNodeMetrics:
> Initializing NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2010-02-02 13:38:31,318 INFO org.apache.hadoop.fs.FSNamesystem:
> fsOwner=app,app
> 2010-02-02 13:38:31,319 INFO org.apache.hadoop.fs.FSNamesystem:
> supergroup=supergroup
> 2010-02-02 13:38:31,319 INFO org.apache.hadoop.fs.FSNamesystem:
> isPermissionEnabled=true
> 2010-02-02 13:38:31,329 INFO org.apache.hadoop.dfs.FSNamesystemMetrics:
> Initializing FSNamesystemMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2010-02-02 13:38:31,331 INFO org.apache.hadoop.fs.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 2010-02-02 13:38:31,375 INFO org.apache.hadoop.dfs.Storage: Number of files
> = 248675
> 2010-02-02 13:38:36,932 INFO org.apache.hadoop.dfs.Storage: Number of files
> under construction = 2
> 2010-02-02 13:38:37,008 INFO org.apache.hadoop.dfs.Storage: Image file of
> size 42924164 loaded in 5 seconds.
> 2010-02-02 13:38:37,020 ERROR org.apache.hadoop.dfs.LeaseManager:
> /path/on/hdfs/_logs/history/my-host-name.com_1261508934685_job_200912221108_15967_conf.xml
> not found in lease.paths
> (=[/path/on/hdfs/_logs/history/my-host-name.com_1261508934685_job_200912221108_15967_app_MyJobName_20100202_10_59])
>
> [[ a bunch more errors like the one above ]]
>
> 2010-02-02 13:38:37,076 ERROR org.apache.hadoop.fs.FSNamesystem:
> FSNamesystem initialization failed.
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:375)
>         at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:585)
>         at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
>         at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
>         at
> org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
>         at
> org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
>         at
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
>         at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
>         at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
>         at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
>         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
> 2010-02-02 13:38:37,077 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 9000
> 2010-02-02 13:38:37,081 ERROR org.apache.hadoop.dfs.NameNode:
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:375)
>         at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:585)
>         at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
>         at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
>         at
> org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
>         at
> org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
>         at
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
>         at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
>         at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
>         at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
>         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
>
> 2010-02-02 13:38:37,082 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at my-host-name.com/10.15.137.204
> ************************************************************/
>
> thanks,
> Bill
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message