hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96
Date Sun, 03 Jun 2012 19:05:23 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288238#comment-13288238

Jonathan Hsieh commented on HBASE-6055:


Thanks for answering the questions.  A strong +1 for doing the simplest hbase timestamp-based
approach first, and then looking into the more complicated version as an option afterwards.
 Maybe start a sub issue with the point-in-time approach to move discussion there? (I still
have questions there, might be better to ask there)

The main use case I care about is ability to quickly "snapshot" without downtime and quickly
recover it (ideally with no downtime, but possibly with a short downtime window).  Although
it is a "sloppy snapshot" conceptually it is pretty simple to define and I think the caveats
are fairly well undestood.  I don't expect something with stronger consistency guarantees
than what hbase currently offers but do expect something better (cheaper/faster) than the
current closest thing which is a CopyTable.  

I have a bunch of new questions - some just asking for precision and some for clarification.
 It might be helpful to define terms in the beginning of the doc so it stays consistent? 

- Hm.. how do you restore a snapshot from references files if it hasn't been scan/copied yet?
 Require scan/copy "materialization" of the snapshot first?  (which means slower restore,
but probably would likely be simplest for a first cut)
- Snapshot restore needs to be "transactional" like snapshotting right?
- what is "export"? is this taking a snapshot or the materialization or the snapshot restore
or something else?
- If we restore snapshots to the same hbase instance, in dir structure, you probably need
.regioninfo files as well. (contains region startkey/endkey info necessary to reconsistute
META later).  
- Is restoring to a separate instance in scope?  If so bulk loads can be expensive -- if regions
don't line up there will be a bunch of spliting that happens.  Again, keeping the regionsinfos
and the snapshot's splits may be worthwhile.
- Where do the materialized versions of the snapshot reference files end up?  in the snapshot
dirs? elsewhere?  
-- This potentially gets a little trickier with markers as opposed to log rolls.
-- The HLog will have edits from regions not relevant to the table's regions.  Not a huge
problem but maybe an optmization would be that the materialization step will do an "offline
hlogsplit/flush" to just keep the data relevent to this table/region?

> Snapshots in HBase 0.96
> -----------------------
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>         Attachments: Snapshots in HBase.docx
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically
changed, opening as a new ticket.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message