hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bennie Schut <bsc...@ebuddy.com>
Subject RE: HDFS snapshots restore
Date Fri, 29 Nov 2013 07:46:26 GMT
Hi Juan,

In addition to Binglin Chang's reply. When you either snapshot or manually copy the data you
need to understand a little bit about how hive works to be able to do a correct restore.
Hive keeps metadata in a separate database. So for example if you have a table with a date
partition it will use the metadata to know which partitions exist. So for example you have
these partitions on hdfs:
/user/hive/warehouse/table/logindate=2013-11-26
/user/hive/warehouse/table/logindate=2013-11-27
/user/hive/warehouse/table/logindate=2013-11-28

If you drop parition "2013-11-27" it will also remove the metadata reference. So if you restore
the data the partition will exist on hdfs but you still need to do some "add partition" commands
before hive will know the partition exists.
It's usually a good idea to snapshot the metadata at the same time you snapshot the hdfs data
so you get one consistent view which you can trust to be correct.

Bennie.

From: Binglin Chang [mailto:decstery@gmail.com]
Sent: Thursday, November 28, 2013 4:27 PM
To: user@hadoop.apache.org
Subject: Re: HDFS snapshots restore

snapshot restore feature is not implemented yet. Currently you can use distcp to copy snapshot
dir to your new cluster, suppose your hive dir is /user/hive/, snapshot dir is /user/hive/.snapshot/sn0,
you can:
 distcp hfds://oldcluster:8020/user/hive/.snapshot/sn0 hdfs://newcluster:8020/somedir


On Thu, Nov 28, 2013 at 9:47 PM, Juan Martin Pampliega <jpampliega@gmail.com<mailto:jpampliega@gmail.com>>
wrote:
Hi,

I have read the documentation about HDFS snapshots for hadoop 2 (http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html)
but it is still not clear how do I use this snapshots to restore the data.

Lets say I have a directory with the data corresponding to a Hive table that I want to backup.
I take a snapshot today and tomorrow I find out that the modifications done to the table/directory
after the snapshot are wrong and I want to revert the directory to the snapshot state. How
do I achieve this?

Also, can I extract the snapshot from HDFS and save it in an external storage and later use
it to restore this directory in a new empty cluster? or which is the recommended way to do
this?


Thanks,
Juan.

Mime
View raw message