hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Chongxin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-50) Snapshot of table
Date Fri, 18 Jun 2010 07:13:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880093#action_12880093

Li Chongxin commented on HBASE-50:

bq. Fail with a warning. A nice-to-have would be your suggestion of restoring snapshot into
a table named something other than the original table's name (Fixing this issue is low-priority
bq. .. it's a good idea to allow snapshot restore to a new table name while the original table
is still online. And the restored snapshot should be able to share HFiles with the original

I will make this issue a low-priority sub-task. One more question, besides metadata and log
file, what else data should take care to rename the snapshot to a new table name? Are there
any other files (e.g. HFiles) containing table name?

bq. ... didn't we discuss that .META. might not be the place to keep snapshot data since regions
are deleted when the system is done w/ them (but a snapshot may outlive a particular region).

I misunderstood... I thought you were talking about create a new catalog table 'snapshot'
to keep the metadata of snapshots, such as creation time.
In current design, a region will not be delete if it is still used by a snapshot, even if
the system has done with it. This region would be probably marked as 'deleted' in .META. This
is discussed in section 6.2, 6.3 and no new catalog table is added. Do you think it is appropriate
to keep metadata in .META. for a deleted region? Do we still need a new catalog table?

bq. rather than causing all of the RS to roll the logs, they could simply record the log sequence
number of the snapshot, right? This will be a bit faster to do and causes even less of a "hiccup"
in concurrent operations (and I don't think it's any more complicated to implement, is it?)

Yes, sounds good. The log sequence number should also be included when the logs are split
because log files would contain the data both before and after the snapshot, right?

bq. Making the client orchestrate the snapshot process seems a little strange - could the
client simply initiate it and put the actual snapshot code in the master? I think we should
keep the client as thin as we can

Ok, This will change the design a little.

bq. I'd be interested in a section about failure analysis - what happens when the snapshot
coordinator fails in the middle? ..

That will be great!

> Snapshot of table
> -----------------
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Li Chongxin
>            Priority: Minor
>         Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report
V3.pdf, snapshot-src.zip
> Havening an option to take a snapshot of a table would be vary useful in production.
> What I would like to see this option do is do a merge of all the data into one or more
files stored in the same folder on the dfs. This way we could save data in case of a software
bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. Say I had
a read_only table that must be online. I could take a snapshot of it when needed and export
it to a separate data center and have it loaded there and then i would have it online at multi
data centers for load balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect from failed
servers, but this does not protect use from software bugs that might delete or alter data
in ways we did not plan. We should have a way we can roll back a dataset.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message