hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Chongxin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-50) Snapshot of table
Date Mon, 12 Apr 2010 03:25:48 GMT

    [ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855824#action_12855824
] 

Li Chongxin commented on HBASE-50:
----------------------------------

@Todd Lipcon
bq. This is the part I'm not clear on. Are you attempting to achieve a simultaneous write
lock across all region servers in the cluster? Also will have to make sure that we "lock"
any regions that are currently being moved, etc. Doing this without impacting realtime workload
on the cluster is the tricky part in my opinion.

Sorry for the mistake that rather than write lock,  a read lock instead should be acquired
when the snapshot is requested. So this read lock will only block the regions that are currently
being moved and other operations such as Get, Put, Delete etc are not impacted at all. Also
this lock is not held across all region servers but only per region. A Region is locked only
when we dump the manifest of  the region and other regions will not be affected. I think this
process is pretty quick and will not impact the whole cluster a lot.

bq.  Hard links would make it easier, but in a large cluster with thousands of regions each
with many hfiles and many column families, iterating over every store file could be prohibitively
expensive if we have to lock everything while doing it. 

Yes, totally agree with you. Iterating over stored files is not efficient. That's why I want
to find some other mechanism to manage the reference counting problem. Probably via .META.
data or Zookeeper, I still don't have a concrete solution, any good ideas?

> Snapshot of table
> -----------------
>
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Priority: Minor
>
> Havening an option to take a snapshot of a table would be vary useful in production.
> What I would like to see this option do is do a merge of all the data into one or more
files stored in the same folder on the dfs. This way we could save data in case of a software
bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. Say I had
a read_only table that must be online. I could take a snapshot of it when needed and export
it to a separate data center and have it loaded there and then i would have it online at multi
data centers for load balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect from failed
servers, but this does not protect use from software bugs that might delete or alter data
in ways we did not plan. We should have a way we can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message