hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Hbase bkup options
Date Mon, 23 Jul 2012 23:03:03 GMT
There are a couple of nits...

1) Compression. This will help a bit when moving the files around. 

2) Data size.  You may have bandwidth issues.  Moving TBs of data over a 1GBe network can
impact your cluster's performance.  (Even with compression)

Depending on your cluster(s) and infrastructure,  there is going to be a point where the cost
of trying to back up to tape is going to exceed the cost of replicating to a second cluster.
At the same time, you have to remember that restoring TBs of data will take time. 

How large a data set will vary by organization. Again, only you can determine the value of
your data. 

If you are backing up to a secondary cluster ... you can use the replication feature in HBase.
This would be a better fit if you are looking at backing up a large set of HBase tables. 


On Jul 23, 2012, at 10:33 AM, Amlan Roy wrote:

> Hi Michael,
> 
> Thanks a lot for the reply. What I want to achieve is, if my cluster goes
> down for some reason, I should be able to create a new cluster and should be
> able to import all the backed up data. As I want to store all the tables, I
> expect the data size to be huge (in order of Tera Bytes) and it will keep
> growing.
> 
> If I have understood correctly, you have suggested to run "export" to get
> the data into hdfs and then run "hadoop fs -copyToLocal" to get it into
> local file. If I take a back up of the files, is it possible to import that
> data to a new Hbase cluster?
> 
> Thanks and regards,
> Amlan
> 
> -----Original Message-----
> From: Michael Segel [mailto:michael_segel@hotmail.com] 
> Sent: Monday, July 23, 2012 8:19 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase bkup options
> 
> Amian, 
> 
> Like always the answer to your question is... it depends.
> 
> First, how much data are we talking about? 
> 
> What's the value of the underlying data? 
> 
> One possible scenario...
> You run a M/R job to copy data from the table to an HDFS file, that is then
> copied to attached storage on an edge node and then to tape. 
> Depending on how much data, how much disk is in the attached storage you may
> want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold
> copy on tape off to some offsite storage facility.
> 
> There are other options, but it all depends on what you want to achieve. 
> 
> With respect to the other tools...
> 
> You can export  (which is a m/r job) to a local directory, then use distcp
> to a different cluster.  hadoop fs -copyToLocal will let you copy off the
> cluster. 
> You could write your own code, but you don't get much gain over existing
> UNIX/Linux tools. 
> 
> 
> On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote:
> 
>> Hi,
>> 
>> 
>> 
>> Is it feasible to do disk or tape backup for Hbase tables?
>> 
>> 
>> 
>> I have read about the tools like Export, CopyTable, Distcp. It seems like
>> they will require a separate HDFS cluster to do that.
>> 
>> 
>> 
>> Regards,
>> 
>> Amlan
>> 
> 
> 


Mime
View raw message