hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-50) Snapshot of table
Date Mon, 10 May 2010 18:16:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865853#action_12865853

Jonathan Gray commented on HBASE-50:

Also, rather than creating a mock Master and mock RegionServer, it might be more useful to
just create classes that can be used by the real Master and RS processes.

For example, the Master class you have could become SnapshotManager or SnapshotMonitor.  The
RegionServer might become SnapshotExecutor.  Just suggestions for names but I would move to
making patches against the svn repository and building stuff in a way that will make it easier
to integrate after you get through early testing.

Does your current design have it so that all RegionServers wait for every other RS to be ready,
and then they concurrently perform their snapshots?  It seems like this could be dangerous
behavior.  We might want to stagger snapshots across the cluster, otherwise this process could
create too much load.

The flow chart is great.  Could you add some different notation about the znode names and
what things will have more than one process performing it?  For example, each RS might make
the znode:  /snapshot/ready/RSNAME or some such.

As far as this thing working in failure scenarios, that could potentially be very complex
(or if designed in a region-centric view rather than regionserver-centric view maybe not so
bad).  Let's try to flesh out the requirements early/now.  If we decide we won't work under
failure, then let's make that explicit and ensure that under failure we can roll everything
back.  If we want it to work under failure (Master and/or RegionServer), let's talk about
it more now because the basic design could have big implications.

In your flow chart, once the snapshot starts it looks like an RS won't communicate until it
finishes.  If that RS is serving 1000 regions of the table being snapshotted, this could take
a long time.  Does the RS need to keep a timestamp updated to let the master/client know it's
still working (and maybe it's progress?).  Also, are splits and load balancing blocked during
snapshot time across the whole cluster?

I think it would be useful to generate a design document.  This would include your current
flow chart as well as some of the stuff in comments on this jira.  I'd like to have something
we can continue to iterate on rather than just comments.

Great work so far!

> Snapshot of table
> -----------------
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Li Chongxin
>            Priority: Minor
>         Attachments: snapshot-flowchart.png, snapshot-src.zip
> Havening an option to take a snapshot of a table would be vary useful in production.
> What I would like to see this option do is do a merge of all the data into one or more
files stored in the same folder on the dfs. This way we could save data in case of a software
bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. Say I had
a read_only table that must be online. I could take a snapshot of it when needed and export
it to a separate data center and have it loaded there and then i would have it online at multi
data centers for load balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect from failed
servers, but this does not protect use from software bugs that might delete or alter data
in ways we did not plan. We should have a way we can roll back a dataset.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message