hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Zinngrabe" <...@mahalo.com>
Subject Re: Back Up Strategies
Date Mon, 22 Sep 2008 19:28:15 GMT
On Mon, Sep 22, 2008 at 12:13 PM, Charles Mason <charlie.mas@gmail.com> wrote:
> Hi All,
>
> I was wondering what the options there are for backup and dumping an
> HBase database. I appreciate that having it run on top of a HDFS
> cluster can protect against individual node failure. However that
> still doesn't protect against the massive but thankfully rare
> disasters which take out whole server racks, fire, floods, etc...

There will be something released for this this week :)

>
> As far as I can tell there are two options:
>
> 1, Scan each table and dump the entire row to some external location,
> like MySQL Dump does for MySQL. Then to recover simply put the new
> data back. I am sure the performance of this is going to be fairly
> bad.

It's not as bad as you may think, though we have not tested it on very
large clusters. Depending on your configuration, the importing of a
backup is usually the most costly operation as regions split, etc.

>
> 2, Image the data stored on the HDFS cluster. Aren't there some big
> issues with it not grabbing a consistent image as some updates won't
> be flushed? Is there any way to force that, or to make it be
> consistent some way, perhaps via snapshoting?

That's correct, and we were not able to come up with a good way to
snapshot HBase. It either took much much longer than dumping the data
out of a table, or gave us inconsistent data. Maybe this will be
easier in a future HBase release, but for now its probably not
something you'd want to do with production data.

>
> Have I missed anything? Anyone got any suggestions?
>
> Charlie M
>



-- 
Dan Zinngrabe
Alchemist -- Mahalo.com
http://www.mahalo.com/member/quellish
dan@mahalo.com

Mime
View raw message