hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: corrupted edits log after power failure
Date Thu, 22 Sep 2011 19:15:25 GMT
Hi Gabi,

I'd be a bit scared of that backup strategy; what happens if the TCP connection gets cut suddenly
during curl?  What happens if there's a TCP corruption?  Such things have happened before.

Personally, we have the SNN merge the edits every 15 minutes.  If it hasn't happened in 30
minutes, people get emailed.  If it doesn't happen in 45 minutes, people get paged.

In addition to writing out copies to a few disks and to NFS, we also have a versioned backup
of the checkpoint.prev.

The worst case scenario would be if the SNN corrupts the image and uploads the corrupt image
(it's a theoretical situation so far...); this would be caught at the next merge, meaning
we trash up to 30 minutes of work.  This would ruin someone's day, but not someone's week.

The NN is a SPOF, and should be treated with an appropriate level of paranoia (and, because
it is a SPOF, assume that it will fail anyway and make sure you can accept the consequences).

Brian

On Sep 22, 2011, at 3:48 AM, Gabi Kazav wrote:

> Hi,
> 
> I had Power Failure.
> I have backup of files: edits, fsimage.
> 
> I am backing it up with:
> 
> curl -s http://nameNode:50070/getimage?getimage=1 > fsimage
> curl -s http://nameNode:50070/getimage?getedits=1 > edits
> 
> When I am trying to start the HDFS with the recovered files, I got error about the edits
file : "Error replaying edit log at offset 1921"
> 
> Also, I have edits.new file, when I rename it to edits I got: "ERROR org.apache.hadoop.hdfs.server.common.Storage:
Error replaying edit log at offset 2494103"
> 
> What is the problem?!
> 
> 
> And from now on, how can I do a backup that works?! :)
> 
> Thanks,
> Gabi.
> 
> 
> 
> 
> Gabi Kazav
> IT Manager And Infrastructure Engineer
> Gabi.Kazav@pursway.com<mailto:Gabi.Kazav@pursway.com> | www.pursway.com<http://www.pursway.com/>
> Mailing address PO Box 4125, Herzliya 46140
> Address 8 Hamada St., Herzliya, IL | Tel +972 527 772457| Fax + 972 9 958 4736
> 


Mime
View raw message