On Aug 3, 2010, at 9:12 AM, Eric Sammer wrote:


All of that said, what you're protecting against here is permanent loss of a
data center and human error. Disk, rack, and node level failures are already
handled by HDFS when properly configured.

You've forgotten a third cause of loss: undiscovered software bugs.

The downside of spinning disks is one completely fatal bug can destroy all your data in about a minute (at my site, I famously deleted about 100TB in 10 minutes with a scratch-space cleanup script gone awry.  That was one nasty bug).  This is why we keep good backups.

If you're very, very serious about archiving and have a huge budget, you would invest a few million into a tape silo at multiple sites, flip the write-protection tab on the tapes, eject them, and send them off to secure facilities.  This isn't for everyone though :)