hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Duxbury <br...@rapleaf.com>
Subject Re: Backup / restore
Date Mon, 25 Feb 2008 19:30:03 GMT
Yes, the actual regionservers are not a part of the schema. The  
assignments are stored in the meta table, but they'll be cleaned up  
when the data is reloaded on another cluster.

-Bryan

On Feb 25, 2008, at 11:24 AM, Marc Harris wrote:

> I think this does answer my question, yes.
>
> So does this mean that the contents of a particular Hbase instance is
> independent of the configured region servers? And that the way an
> instance is split up into regions is independent of the available  
> region
> servers? If so, then, yes, it seems there is nothing specific to be  
> done
> for Hbase for off-line backup.
>
> Now I have to figure out how to (recursively) copy a directory from  
> one
> HDFS instance to another.
>
> Thanks.
> - Marc
>
> On Mon, 2008-02-25 at 10:09 -0800, Bryan Duxbury wrote:
>
>> If an offline backup/restore is acceptable, then we already have it.
>> All you have to do is copy your hbase rootdir to a new location in
>> hdfs, and you've made a backup. You can also use this technique to
>> copy one instance to another - just boot up a master pointed at the
>> new directory and voila.
>>
>> As far as dumping to a single file or a group of sql statements, that
>> seems like it would be a suboptimal way to manage the amount of data
>> you could potentially be working with. At the very least you want
>> many files. It also makes sense to keep them in their region
>> divisions, otherwise it will be an inordinate amount of work to
>> restore into HBase at a later date.
>>
>> Does this answer your question?
>>
>> -Bryan
>>
>> On Feb 25, 2008, at 10:00 AM, Marc Harris wrote:
>>
>>> There has been discussion before about backup / restore but the
>>> discussion has tended to fizzle out. I would like to see backup /
>>> restore functionality for Hbase for the following two purposes:
>>>
>>> 1) Protection against software bugs deleting data. This is not just
>>> the
>>> proverbial namenode gone haywire, but user code running in a map-
>>> reduce
>>> task that deletes the wrong thing could be just as disastrous.
>>> 2) Ability to copy one Hbase instance's data to another instance.  
>>> It's
>>> pretty common in sql-land to run a backup tool that produces a large
>>> file (either a compact export file, or just a sequence of sql
>>> statements). This can then be imported to another instance of the  
>>> db.
>>>
>>> The particular use case I have is that of a production Hbase  
>>> instance
>>> and a development or QA instance. It would be useful to be able to
>>> dump
>>> the production instance periodically, and then load it into a
>>> development instance so that new code could be run against it.
>>>
>>> I think this would be Hbase specific, not a general Hadoop dump /
>>> restore, because only the logical data should be transferred, not  
>>> the
>>> precise structure of how tables are split into regions. Does such as
>>> thing exist?
>>


Mime
View raw message