hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: Backing up HDFS
Date Tue, 03 Aug 2010 14:42:17 GMT

On Aug 3, 2010, at 9:12 AM, Eric Sammer wrote:

<snip/>
> 
> All of that said, what you're protecting against here is permanent loss of a
> data center and human error. Disk, rack, and node level failures are already
> handled by HDFS when properly configured.

You've forgotten a third cause of loss: undiscovered software bugs.

The downside of spinning disks is one completely fatal bug can destroy all your data in about
a minute (at my site, I famously deleted about 100TB in 10 minutes with a scratch-space cleanup
script gone awry.  That was one nasty bug).  This is why we keep good backups.

If you're very, very serious about archiving and have a huge budget, you would invest a few
million into a tape silo at multiple sites, flip the write-protection tab on the tapes, eject
them, and send them off to secure facilities.  This isn't for everyone though :)

Brian
Mime
View raw message