hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Phelps <...@opendns.com>
Subject Re: Failed namenode restart, recovering from corrupt edits file?
Date Wed, 12 Jan 2011 21:36:05 GMT
On 1/12/11 1:05 PM, Friso van Vollenhoven wrote:
> If I am correct your proposed solution would set you back to a image
> from about 15-30 minutes before the crash. I think it depends on what
> you do with your HDFS (HBase, append only things, ?), whether that will
> work out. In our case we are running HBase and going back in time with
> the NN image is not very helpful then, because of splits and compactions
> removing and adding files all the time. On append only workloads where
> you have the option of redoing whatever it is that you did just before
> the time of the crash, this could work. But, please verify with someone
> with a better understanding of HDFS internals.

We do run HBase.  Its our desire to avoid trashing the intervening data, 
however ditching the particular MR output files that show up in the 
error would be fine.

> Also, there apparently is a way of healing a corrupt edits file using
> your favorite hex editor. There is a thread here:
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/%3CAANLkTinBHmn1X8DLir-c4iBhjA9nh46tnS588CQCNv1h@mail.gmail.com%3E
> <http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/<AANLkTinBHmn1X8DLir-c4iBhjA9nh46tnS588CQCNv1h@mail.gmail.com>>

Thanks for the link.  Manually editing the edits file is our current 
thought, a little understanding of the format should save us some pain.

> There is a thread about this (our) problem on the cdh-user Google group.
> You could also try to post there.

Thanks, I'll go take a look there.

- Adam

View raw message