hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2496) Snapshot of table
Date Fri, 28 Dec 2007 09:01:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554663

Billy Pearson commented on HADOOP-2496:

Backup/snapshot should take each region as is and copy it to a folder and meta data for table
should be backed up also with the snapshot.

For a fast load of the restore we could 
stop serveing (disable) the table
delete current regions and meta data for the table
copy the backup regions in to the correct locations for hbase region serving
reload the backup meta data.
enable the table

On the next rescan of the master the new meta would be picked up and the master could start
assigning the regions to regionservers this way no time is spend reloading the data.

> Snapshot of table
> -----------------
>                 Key: HADOOP-2496
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2496
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>             Fix For: 0.16.0
> Havening an option to take a snapshot of a table would be vary useful in production.
> What I would like to see this option do is do a merge of all the data into one or more
files stored in the same folder on the dfs. This way we could save data in case of a software
bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. Say I had
a read_only table that must be online. I could take a snapshot of it when needed and export
it to a separate data center and have it loaded there and then i would have it online at multi
data centers for load balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect from failed
servers, but this does not protect use from software bugs that might delete or alter data
in ways we did not plan. We should have a way we can roll back a dataset.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message