hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-50) Snapshot of table
Date Wed, 11 Aug 2010 18:49:29 GMT

    [ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897382#action_12897382

HBase Review Board commented on HBASE-50:

Message from: stack@duboce.net

This is an automatically generated e-mail. To reply, visit:

This is looking really great Li.  If you can address the comments below in a new version,
lets get this patch committed.

Unfortunately, the master rewrite is going to change a bunch of the stuff that this patch
depends on.  For example, BaseScanner is going away.  But, thats not your issue.

What about scaling?  The only problematic issue I saw  was copy of storefiles on restore.
 We should file an issue to do that via a MR job.




    Why let this out?


    Want to remove this or enable the assertion?  One or the other I'd say rather than this.


    This class looks good.


    Its a pity this class is named so.  We're about to bring in a new patch that redoes the
zk stuff -- breaks it up into pieces each with a singular purpose; e.g. tracking root location
or tracking meta region server -- and unfortunately the pattern is to name these purposed
classes *Tracker.  There'll be a clash of this kinda Tracker and the new zk Trackers.  Not
important, just saying in case you have another name in mind for this class.


    Can you make this a configuration? int maxRetries = Configuration.getInt("hbase.snapshot.retries",
3); or something?  It doesn't have to go into hbase-default.xml


    Yeah, can the max retries be made into a data member rather than hardcoded everywhere
in this class?


    Looks like you need more explaination here?  This is a special case right where the table
is offline and we're asked to make a snapshot of it?


    Its not a backup, its creating references, right?


    This looks like an improvement, making it static so can be used more generally.


    And flushing is disabled at this point too, right?  Compactions? (Good).


    If snapshot fails, do we have to do cleanup?




    This looks good.


    Call this Snapshotter instead?


    Do we have to do this down at the Store level?  Coud we move it up to Region or up to
the RegionServer itself?  It already has an HTable instance.


    Why you have to pass the reference?  It wasn't needed previously?


    Does this stuff belong in here in this general utility class?  Should it be polluted with
References?  Should this stuff be over in io package where the Reference is or static methods
on Reference?


    What about a test of restore from snapshot?  Is there one?  I dont' see it?


    Good test


    There is getTestDir(final String subdirName) FYI


    Looks good.



- stack

> Snapshot of table
> -----------------
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Li Chongxin
>            Priority: Minor
>         Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report
V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png
> Havening an option to take a snapshot of a table would be vary useful in production.
> What I would like to see this option do is do a merge of all the data into one or more
files stored in the same folder on the dfs. This way we could save data in case of a software
bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. Say I had
a read_only table that must be online. I could take a snapshot of it when needed and export
it to a separate data center and have it loaded there and then i would have it online at multi
data centers for load balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect from failed
servers, but this does not protect use from software bugs that might delete or alter data
in ways we did not plan. We should have a way we can roll back a dataset.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message