hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Yates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96
Date Tue, 10 Jul 2012 18:38:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410682#comment-13410682

Jesse Yates commented on HBASE-6230:

 Restore Table

Given a "snapshot name" restore override the original table with the snapshot content.
Before restoring a new snapshot of the table is taken, just to avoid bad situations.
(If the table is not disabled we can keep serving reads)

This allows a full and quick rollback to a previous snapshot.

+1 on the general design.

How does this correspond to restoring a table from a snapshot when the table doesn't exist?
I feel like this should be a semantically different use case, though the underlying implementation
will probably only differ in terms of not taking a snapshot of the existing table because
no existing table can exist. I'd propose that Restore -> Rollback and Restore then means
just taking a snapshot and creating a table from it. This means on the external cluster, the
exported snapshot is then 'restored' on the remote cluster.

Clone Snapshot

This could be very, very tricky in terms of multiple tables reading the same files. You would
have to make sure that no other tables are using the current HFiles when a compaction comes
around. Otherwise, when you archive the files, you will break the other table using those
files. Maybe there is some niceness in HDFS that will blowup on you when trying to move a
file someone else is currently reading, but that would take some investigation. I have a feeling
there is also a bunch of code that assumes a certain layout for the files that will make this
hard. I'm not saying its not doable, but its not going to be trivial.

* To Restore only "individual items" (only some small range of data was lost from "current")
** MR job that scan the cloned table and update the data in the original one. (Partial restore
of the data)

This seems like  slightly more difficult proposal. I'm not adverse to doing this, but it isn't
a trivial operation and probably should be taken care of by a Map/Reduce job that exports
to a 'small' (depending on data-size), temporary table so we can easily filter out the right
ranges without having to stand up a special region or do a ton of compactions. This means
it becomes an inherently slower operation, but should be performant enough for recovering
data and makes lots of sense to recovering a very large chunk in terms of overall throughput
(though you probably want to just restore a clone at that point).

This brings up another potential nicety  - a snapshot and clone operation. Takes a snapshot
of the existing table and then stands up a clone of that data. Small addition to the interface
and to me what a real 'clone' operation should do.

Export Snapshot
+1 Let the remote cluster restore the snapshot if they want to do it - don't force a table
to be stood up immediately.
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
> Discussion ticket around the definitions/expectations of different parts of snapshot
restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message