hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Chongxin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-50) Snapshot of table
Date Wed, 30 Jun 2010 03:27:57 GMT

    [ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883779#action_12883779

Li Chongxin commented on HBASE-50:

bq. isSnapshot in HRI? 
bq. Will keeping snapshot data in .META. work? .META. is by region but regions are deleted
after a split but you want your snapshot to live beyond this?

Snapshot data, actually the reference count of hfiles, will be kept in .META. table, but in
a different row than the original region row. So these reference count information will not
be deleted after a split. Reference count information is saved here because it is also in
a region centric view. Reference count information of a region's hfiles are kept together
in a row in .META. no matter this hfile is still in use or has been archived. I described
this in the Appendix A. of the document. 

bq. In zk, writeZnode and readZnode ain't the best names for methods... what kinda znodes
are these? (Jon says these already exist, that they are not your fault)

Actually the method names for snapshot are startSnapshotOnZK, abortSnapshotOnZK, registerRSForSnapshot
in ZooKeeperWrapper. I put writeZnode and readZnode in the diagram because I think I can use
them inside the above methods.
Do you think we should make writeZnode and readZnode private and just use them inside ZooKeeperWrapper?

bq. Can you make a SnapShot class into which encapsulate all related to snapshotting rather
than adding new data members to HMaster? Maybe you do encapsulate it all into snapshotmonitor?

I haven't figured out all the data members in the design. I will create a Snapsnot class to
encapsulate the related fields if necessary during implementation.

bq. Can you call RSSnapshotHandler just SnapshotHandler?


bq. You probably don't need to support String overloads.

You mean methods in HBaseAdmin?

A repository has been created in github with the initial content of hbase/trunk

> Snapshot of table
> -----------------
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Li Chongxin
>            Priority: Minor
>         Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report
V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png
> Havening an option to take a snapshot of a table would be vary useful in production.
> What I would like to see this option do is do a merge of all the data into one or more
files stored in the same folder on the dfs. This way we could save data in case of a software
bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. Say I had
a read_only table that must be online. I could take a snapshot of it when needed and export
it to a separate data center and have it loaded there and then i would have it online at multi
data centers for load balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect from failed
servers, but this does not protect use from software bugs that might delete or alter data
in ways we did not plan. We should have a way we can roll back a dataset.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message