hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerrit Jansen van Vuuren <gerrit...@googlemail.com>
Subject Re: HDFS will not start after Namenode ran out of disk space
Date Wed, 29 Sep 2010 07:13:28 GMT
Hi,

We had the same problem om our hadoop cluster a year ago. What
happenned is the edit log for the namenode metadata has become
corrupt, from your stack trace this is what's happened to your
cluster.
Do you have any backup copies of this data?

If not the only option you have is to start over and reload the data :(

What we've done to prevent this
-Nagios monitoring on namenode disk space
-no one should ever do any work on the namenode, the namenode is for
the namenode and jobtracker only (if jobtracker runs on same box).
-run secondary namenode. You MUST run this I explain why:
The namenode maintains 2 files the image and th edit logs. The
namenode will only ever merge the editslog into the image when:
1. It is restarted
2. The secondary namenode calls it.

This is why if you do not run the secondary namenode you namenode
editslog will grow and grow, and when things like disk full happens if
your edits log is small your data loss will be less.

Cheers,
Gerrit

On 28/09/2010, Ayon Sinha <ayonsinha@yahoo.com> wrote:
> Our Namenode ran out of diskspace and became unresponsive. Tried to bounce
> DFS
> and it fails with the following error. How do I recover HDFS even at the
> cost of
> loosing a whole bunch on recent edits.
>
>
>
> 2010-09-28 00:45:00,777 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=Name
> Node, port=9000
> 2010-09-28 00:45:00,783 INFO org.apache.hadoop.dfs.NameNode: Namenode up at:
> xxxx.yyyy.zzzz.com/aa.bb.ccc.dd:
> 9000
> 2010-09-28 00:45:00,786 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=N
> ameNode, sessionId=null
> 2010-09-28 00:45:00,790 INFO org.apache.hadoop.dfs.NameNodeMetrics:
> Initializing
> NameNodeMeterics using context ob
> ject:org.apache.hadoop.metrics.spi.NullContext
> 2010-09-28 00:45:00,843 INFO org.apache.hadoop.fs.FSNamesystem:
> fsOwner=xxxxxx,xxxxx
> 2010-09-28 00:45:00,843 INFO org.apache.hadoop.fs.FSNamesystem:
> supergroup=supergroup
> 2010-09-28 00:45:00,843 INFO org.apache.hadoop.fs.FSNamesystem:
> isPermissionEnabled=false
> 2010-09-28 00:45:00,850 INFO org.apache.hadoop.dfs.FSNamesystemMetrics:
> Initializing FSNamesystemMeterics using co
> ntext object:org.apache.hadoop.metrics.spi.NullContext
> 2010-09-28 00:45:00,851 INFO org.apache.hadoop.fs.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 2010-09-28 00:45:00,880 INFO org.apache.hadoop.dfs.Storage: Number of files
> =
> 682462
> 2010-09-28 00:45:09,060 INFO org.apache.hadoop.dfs.Storage: Number of files
> under construction = 4
> 2010-09-28 00:45:09,132 INFO org.apache.hadoop.dfs.Storage: Image file of
> size
> 81850130 loaded in 8 seconds.
> 2010-09-28 00:45:09,144 INFO org.apache.hadoop.dfs.Storage: Edits file edits
> of
> size 15190 edits # 71 loaded in 0
> seconds.
> 2010-09-28 00:45:09,147 ERROR org.apache.hadoop.fs.FSNamesystem:
> FSNamesystem
> initialization failed.
> java.io.EOFException
>         at java.io.DataInputStream.readFully(Unknown Source)
>         at java.io.DataInputStream.readLong(Unknown Source)
>         at org.apache.hadoop.dfs.Block.readFields(Block.java:126)
>         at org.apache.hadoop.dfs.FSEditLog.readBlocks(FSEditLog.java:1127)
>         at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:456)
>         at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:849)
>         at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
>         at
> org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
>         at
> org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
>         at
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
>         at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
>         at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
>         at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
>         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
> 2010-09-28 00:45:09,148 INFO org.apache.hadoop.ipc.Server: Stopping server
> on
> 9000
> 2010-09-28 00:45:09,149 ERROR org.apache.hadoop.dfs.NameNode:
> java.io.EOFException
>         at java.io.DataInputStream.readFully(Unknown Source)
>         at java.io.DataInputStream.readLong(Unknown Source)
>         at org.apache.hadoop.dfs.Block.readFields(Block.java:126)
>         at org.apache.hadoop.dfs.FSEditLog.readBlocks(FSEditLog.java:1127)
>         at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:456)
>         at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:849)
>         at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
>         at
> org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
>         at
> org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
>         at
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
>         at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
>         at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
>         at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
>         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
>
> 2010-09-28 00:45:09,149 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:
> -Ayon
>
>
>

-- 
Sent from my mobile device

Mime
View raw message