hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Pragmatic cluster backup strategies?
Date Tue, 29 May 2012 17:02:51 GMT
Yes you will have redundancy, so no single point of hardware failure can wipe out your data,
short of a major catastrophe.  But you can still have an errant or malicious "hadoop fs -rm
-rf" shut you down.  If you still have the original source of your data somewhere else you
may be able to recover, by reprocessing the data, but if this cluster is your single repository
for all your data you may have a problem.

--Bobby Evans

On 5/29/12 11:40 AM, "Michael Segel" <michael_segel@hotmail.com> wrote:

That's not a back up strategy.
You could still have joe luser take out a key file or directory. What do you do then?

On May 29, 2012, at 11:19 AM, Darrell Taylor wrote:

> Hi,
> We are about to build a 10 machine cluster with 40Tb of storage, obviously
> as this gets full actually trying to create an offsite backup becomes a
> problem unless we build another 10 machine cluster (too expensive right
> now).  Not sure if it will help but we have planned the cabinet into an
> upper and lower half with separate redundant power, then we plan to put
> half of the cluster in the top, half in the bottom, effectively 2 racks, so
> in theory we could lose half the cluster and still have the copies of all
> the blocks with a replication factor of 3?  Apart form the data centre
> burning down or some other disaster that would render the machines totally
> unrecoverable, is this approach good enough?
> I realise this is a very open question and everyone's circumstances are
> different, but I'm wondering what other peoples experiences/opinions are
> for backing up cluster data?
> Thanks
> Darrell.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message