hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Bond <bond.b...@gmail.com>
Subject Re: How to backup and Restore Hadoop 2.x ?
Date Wed, 09 Sep 2015 06:42:39 GMT
One way is to create a backup cluster or a secondary cluster.
1. Ingest data in both clusters in "parallel", basically run jobs in both
the clusters. This will kind of help you in backup and also make sure that
you can switch over to the back up cluster when you have troubles with the
Primary cluster. This setup usually makes sense when you have 2 Data
centers with one being Primary DC and the other Backup.
2. Have a primary cluster and a secondary which is kept in sync with thr
primary. Usually distcp type of jobs. Cloudera gives a front end to manage
this replications but essentially does a distcp in the background.
3. If your data ingestion is flume/kafka etc, you can use it to write to
both Primary/secondary clusters.

I am not sure if anybody uses a tape/archive to backup a hadoop cluster. I
guess somebody who does can comment.

On Wed, Sep 9, 2015 at 11:34 AM, Arthur Chan <arthur.hk.chan@gmail.com>

> Hi,
> Any idea how to backup and restore Hadoop 2.x?   Use tape or form a new
> Hadoop cluster, or any other options?
> I use Hadoop 2.6 with HBase and Hive
> Thanks
> Regards

View raw message