hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Chongxin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-50) Snapshot of table
Date Wed, 07 Apr 2010 02:33:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854299#action_12854299

Li Chongxin commented on HBASE-50:

@ Todd Lipcon Here are some comments for your questions

*  How do we make snapshot creation very low impact on the cluster?
According to the proposal, there is no memstore flush for the snapshot creation, and no real
copy is performed. What snapshot creation does is to dump the manifest and meta data as well
as roll the WAL. I think this does not impact the cluster very much. What do you think?

* What happens if the snapshot is initiated during a transition? eg a region is in the middle
of a split or recovery?
In current implementation, a write lock is acquired when the system is trying to do a transition.
When the snapshot is requested, we can try to acquire this write lock. Snapshot is initiated
only If the write lock can be obtained.

* How do we do the reference counting in an efficient way?
I'm not sure what Jonathan Gray mean is hard-link or hard-lock in HDFS. If hard-links is supported
in HDFS, then everything is simple since HDFS wil handle the reference counting of the files.
But if hard-link is not supported, then we will have to count the reference by outselves.
Probably  via metadata, zookeeper or ondisk files. Any ideas?

* If old files are moved aside after a compaction, how do we deal with concurrent readers
of the snapshot?
I'm not very clear about this question. What's the problem with concurrent readers of the
snapshot? Can you give some more comments?

> Snapshot of table
> -----------------
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Priority: Minor
> Havening an option to take a snapshot of a table would be vary useful in production.
> What I would like to see this option do is do a merge of all the data into one or more
files stored in the same folder on the dfs. This way we could save data in case of a software
bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. Say I had
a read_only table that must be online. I could take a snapshot of it when needed and export
it to a separate data center and have it loaded there and then i would have it online at multi
data centers for load balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect from failed
servers, but this does not protect use from software bugs that might delete or alter data
in ways we did not plan. We should have a way we can roll back a dataset.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message