hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-50) Snapshot of table
Date Fri, 21 Mar 2008 03:01:24 GMT

    [ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580992#action_12580992

stack commented on HBASE-50:

Other ideas.  A command on the master would send a signal to all regionservers.  They would
dump their in-memory content and tell the master when done.  They would then block until they
got the all-clear from the master and take reads but no updates.   Master would then do a
listing of the current content of the filesystem and dump a file listing of all files.  The
all-files-listing could then be used as input for a discp job.  Master would wait until it
gets a prompt from the admin that the distcp was complete or it would give the all-clear after
the dump of the catalog of all files and instead of file delete on compaction or region delete,
instead, files would get a '.deleted' suffix.  The running distcp, if it couldn't find the
original file would look for the same file with the '.deleted' suffix and copy that instead.

> Snapshot of table
> -----------------
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Priority: Minor
> Havening an option to take a snapshot of a table would be vary useful in production.
> What I would like to see this option do is do a merge of all the data into one or more
files stored in the same folder on the dfs. This way we could save data in case of a software
bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. Say I had
a read_only table that must be online. I could take a snapshot of it when needed and export
it to a separate data center and have it loaded there and then i would have it online at multi
data centers for load balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect from failed
servers, but this does not protect use from software bugs that might delete or alter data
in ways we did not plan. We should have a way we can roll back a dataset.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message