hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject A list of HBase backup options
Date Thu, 10 Mar 2011 19:33:31 GMT

I've got some data in HBase that I'd hate to lose.  Yeah, very original. :))
I know I can:
1) make a export/backup of 1 table at a time using 
org.apache.hadoop.hbase.mapreduce.Export from HBASE-1684
2) copy 1 table at a time using 

3) use distcp to copy the whole /hbase part of HDFS
4) replicate the whole cluster - http://hbase.apache.org/replication.html
5) count on HDFS replication and live without the standard backup

What I'm not sure about is the following:

1) Is any one of the above options "hot", meaning that it can be used while the 
source cluster is running and that it produces a consistent backup (a snapshot 
or checkpoint of the source cluster's data)?
I imagine only replication of the whole cluster (point 4) above) is really 

2) If the HBase cluster lives in EC2, what's the best thing to do with the 
backup/snapshot?  EBS may be too expensive.  Are people stuffing their HBase 
backups into S3 somehow, despite the S3 per-bucket limit of 5 GB?

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/

View raw message