hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Newman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-50) Snapshot of table
Date Tue, 17 Nov 2009 05:07:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778724#action_12778724
] 

Alex Newman commented on HBASE-50:
----------------------------------

Say you flushed the logs and then ran a compaction and waited for the cluster to chill out.
Unless you had extremely  high churn rates I would suggest: A mapreduce with a region per
task which fails and retries in case of flushes or compactions. In the case of a split you
can fail the job, disable splitting, or have some way of getting the children later.

Even though you would have kindof data throughout a time period, you would at least have a
timestamp of when that backup was made. I.E. consistency on a region level which is all a
lot of us really want.

> Snapshot of table
> -----------------
>
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Alex Newman
>            Priority: Minor
>
> Havening an option to take a snapshot of a table would be vary useful in production.
> What I would like to see this option do is do a merge of all the data into one or more
files stored in the same folder on the dfs. This way we could save data in case of a software
bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. Say I had
a read_only table that must be online. I could take a snapshot of it when needed and export
it to a separate data center and have it loaded there and then i would have it online at multi
data centers for load balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect from failed
servers, but this does not protect use from software bugs that might delete or alter data
in ways we did not plan. We should have a way we can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message