hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Kostyrka <andr...@kostyrka.org>
Subject Re: critical name node problem
Date Fri, 05 Sep 2008 12:53:49 GMT
Ok, googling a little bit around, the solution seems to either delete the 
edits file, which in my case would be non-cool (24MB worth of edits in 
there), or truncate it correctly.

So I used the following script to figure out how much data needs to be 
dropped:

LEN=25497570

while true
do
   dd if=edits.org of=edits bs=$LEN count=1
   time hadoop namenode
   if [[ $? -ne 255 ]]
   then
      echo $LEN seems to have worked.
      exit 0
   fi
   LEN=$(expr $LEN - 1)
done

Guess something like this might make sense to add 
http://wiki.apache.org/hadoop/TroubleShooting
not everyone will be able to figure out how to get rid of the "last" 
incomplete record.

Another idea would be a tool or namenode startup mode that would make it 
ignore EOFExceptions to recover as much of the edits as possible.

Andreas

On Friday 05 September 2008 13:30:34 Andreas Kostyrka wrote:
> Hi!
>
> My namenode has run out of space, and now I'm getting the following:
>
> 08/09/05 09:23:22 WARN dfs.StateChange: DIR* FSDirectory.unprotectedDelete:
> failed to
> remove /data_v1/2008/06/26/12/pub1-access-2008-06-26-11_52_07.log.gz
> because it does not exist
> 08/09/05 09:23:22 INFO ipc.Server: Stopping server on 9000
> 08/09/05 09:23:22 ERROR dfs.NameNode: java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
>         at
> org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
>         at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:441)
>         at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:766)
>         at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:640)
>         at
> org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:223)
>         at
> org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80) at
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:274)
>         at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:255)
>         at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>         at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>
> 08/09/05 09:23:22 INFO dfs.NameNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at
> ec2-67-202-42-251.compute-1.amazonaws.com/10.251.39.196
>
> hadoop-0.17.1 btw.
>
> What do I do now?
>
> Andreas



Mime
View raw message