hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Hadoop multi tier backup
Date Tue, 30 Aug 2011 20:13:19 GMT

Matthew, the short answer is hire a consultant to work with you on your DR/BCP strategy. :-)

Short of that... you have a couple of things...

Your back-up cluster, is it in the same site? (What happens when site goes down?)

Are you planning to make your back up cluster and main cluster homogenous? By this I mean
if your main cluster has 1PB of disk w 4x2TB or 4x3TB drives, will your backup cluster have
the same configuration? 
(You may want to consider asymmetry in designing your clusters) So your backup cluster has
fewer nodes but more drives per node.

You also have to look at your data. Are your data sets small and discrete? If so, you could
probably back them up to tape, (snapshots) , just in case of human error and you didn't catch
it in time and the error gets propagated to your backup cluster.

I haven't played with fuse, so I don't know if there are any performance issues, but on a
back up cluster, I don't think its much of an issue.

> From: matthew.goeke@monsanto.com
> To: common-user@hadoop.apache.org; cdh-user@cloudera.org
> Subject: Hadoop multi tier backup
> Date: Tue, 30 Aug 2011 16:54:07 +0000
> All,
> We were discussing how we would backup our data from the various environments we will
have and I was hoping someone could chime in with previous experience in this. My primary
concern about our cluster is that we would like to be able to recover anything within the
last 60 days so having full backups both on tape and through distcp is preferred.
> Out initial thoughts can be seen in the jpeg attached but just in case any of you are
weary of attachments it can also be summarized below:
> Prod Cluster --DistCp--> On-site Backup cluster with Fuse mount point running NetBackup
daemon --NetBackup--> Media Server --> Tape
> One of our biggest grey areas so far is how do most people accomplish incremental backups?
Our thought was to tie this into our NetBackup configuration as this can be done for other
connectors but we do not see anything for HDFS yet.
> Thanks,
> Matt
> This e-mail message may contain privileged and/or confidential information, and is intended
to be received only by persons entitled
> to receive such information. If you have received this e-mail in error, please notify
the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use of this e-mail
by you is strictly prohibited.
> All e-mails and attachments sent and received are subject to monitoring, reading and
archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for checking for the
presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage caused by
any such code transmitted by or accompanying
> this e-mail or any attachment.
> The information contained in this email may be subject to the export control laws and
regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR) and sanctions
regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information
you are obligated to comply with all
> applicable U.S. export laws and regulations.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message