hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-50) Snapshot of table
Date Wed, 11 Aug 2010 04:48:26 GMT

    [ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897136#action_12897136

HBase Review Board commented on HBASE-50:

Message from: stack@duboce.net

This is an automatically generated e-mail. To reply, visit:

I reviewed the first page of diffs.  Will do second page tomorrow (This is a lovely big patch
Li -- so far, so good... keep up the good work).


    This is fine for an hbase that is a fresh install but what about case where the data has
been migrated from an older hbase version; it won't have this column family in .META.  We
should make a little migration script that adds it or on start of new version, check for it
and if not present, create it.




    Maybe exception should be called TablePartiallyOpenException (Not important.  Minor).


    Can the snapshot name be empty and then we'll make one up?


    Sure, why not. Its stored in the filesystem?  If so, for sure use same rule.


    For restore of the snapshot, do you use loadtable.rb or Todd's new bulkloading scripts?


    This looks like an improvement.


    Whats this?  A different kind of reference?


    Do you need to specify the ordinals?  Wont they be these numbers anyways? Or is it in
case someone adds a new type of reference?


    Why not just use the ordinal?  http://download-llnw.oracle.com/javase/1.5.0/docs/api/java/lang/Enum.html#ordinal()
 And serialize the int?


    Hmm... is this good?  You are dropping some the region name when you toString.  Do we
have to?


    This could be a problem when fellas migrate old data to use this new hbase.  If there
are References in the old data, then this deserialization will fail?  I'm fine w/ you creating
a new issue named something like "Migration from 0.20.x hbase to 0.90" and adding a note in
there that we need to consider this little issue.  Better though would be if the data was
able to migrate itself at runtime; i.e. recognize a boolean on the stream and then deserialize
the old style into the new, etc.




    Is this comment right?


    Why Random negative number?  Why not just leave it blank?


    It'd be cool breaking up these methods so they were static if possible so you could unit
test them to ensure they do as advertised.


    Check the result.  It may not have worked.  If it didn't deserves a WARN at least.


    You might want to check the returns from these methods.


    Don't bother warning I'd say.. just throw


    Yeah, just throw... that'll show in logs anyway (I believe)


    I'm with Ted on this one. Need to do something about it else it'll annoy till the end
of time.


    This looks like it should be a general utility method (Not important)


    If table were big, this could be prohibitively expensive?  A single-threaded copy of all
of a tables data?  We could compliment this with MR-base restore, something that did the copy
using MR?


    This looks like a class that you could write a unit test for?

- stack

> Snapshot of table
> -----------------
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Li Chongxin
>            Priority: Minor
>         Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report
V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png
> Havening an option to take a snapshot of a table would be vary useful in production.
> What I would like to see this option do is do a merge of all the data into one or more
files stored in the same folder on the dfs. This way we could save data in case of a software
bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. Say I had
a read_only table that must be online. I could take a snapshot of it when needed and export
it to a separate data center and have it loaded there and then i would have it online at multi
data centers for load balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect from failed
servers, but this does not protect use from software bugs that might delete or alter data
in ways we did not plan. We should have a way we can roll back a dataset.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message