hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-50) Snapshot of table
Date Mon, 07 Jun 2010 18:30:55 GMT

    [ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876352#action_12876352

stack commented on HBASE-50:

On, "5 Snapshot Creation"

.bq "Because this table region must be online, dumping the HRegionInfo of the region to a
file ".regioninfo" under the snapshot directory of this region will obtain the metadata."

...The above is wrong, right?  We can snapshot online tables?

+1 on reading .META. data, flushing it to .regioninfo to be sure you have latest, and then
copying that (Or, instead, you could ensure that on any transistion, the .regioninfo is updated.
 If this is happening, no need to do extra flush of .META. at snapshot time.  This latter
would be better IMO).

So, do you foresee your restore-from-snapshot running split over the logs as part of the restore?
 That makes sense to me.

Why you think we need a Reference to the hfile?  Why not just a file that lists the names
of all the hfiles?  We don't need to execute the snapshot, do we?  Restoring from a snapshot
would be a bunch of file renames and wal splitting?  Or what are you thinking?  (Oh, maybe
I'll find out when I read chapter 6).

.bq ....can be created just by the master.

Lets not have the master run the snapshot... let the client run it?

Shall we name the new .META. column family snapshot rather than reference?

I like this idea of keeping region snapshot and reference counting beside the region up in

On the filename '.deleted', I think it a mistake to give it a '.' prefix especially given
its in the snapshot dir (the snapshot dir probably needs to be prefixed with a character illegal
in tablenames such as a '.' so its not taken for a table directory).

Regards 'Not sure whether there will be a name collision under this ".deleted" directory',
j-d has done work to ensure WALs are uniquely named.  Storefiles are given a random-id.  We
should probably do the extra work to ensure they are for sure unique... give them a UUID or
something to we don't ever clash.

After reading chapter 6, I fail to see why we should keep References to files.  Maybe I'm
missing something.

.bq Not decides where to keep all the snapshots information, in a meta file under snapshot

Do you need a new catalog table called snapshots to keep list of snapshots, of what a snapshot
comprises and some other metadata such as when it was made, whether it succeeded, who did
it and why?

On the other hand, a directory in hdfs of files per snapshot will be more robust.

Section 7.4 is missing split of WAL files.  Perhaps this can be done in a MR job?

Design looks excellent Li.

> Snapshot of table
> -----------------
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Li Chongxin
>            Priority: Minor
>         Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip
> Havening an option to take a snapshot of a table would be vary useful in production.
> What I would like to see this option do is do a merge of all the data into one or more
files stored in the same folder on the dfs. This way we could save data in case of a software
bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. Say I had
a read_only table that must be online. I could take a snapshot of it when needed and export
it to a separate data center and have it loaded there and then i would have it online at multi
data centers for load balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect from failed
servers, but this does not protect use from software bugs that might delete or alter data
in ways we did not plan. We should have a way we can roll back a dataset.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message