hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matteo Bertozzi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot and FileLink
Date Sun, 24 Feb 2013 10:20:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585340#comment-13585340
] 

Matteo Bertozzi commented on HBASE-7912:
----------------------------------------

{code}
Still because of compaction, hour by hour, eventually the same target will be full of duplicate
data (different files but contain the same data). Or did I misundertand anything here?
{code}
True, this is a problem of the current way to use the fs + compaction, but that can be solved
in the future. I think that somewhere referring to HBASE-7806 I've mentioned data deduplication
to fix this case.

If we have the log per table/region as Ted mentioned I think that the snapshot + wals is a
good short term approach. But looking at the long term the snapshot should be enough since
the only problem now is the duplicated data due to compaction (also other consistency models
can be plugged into snapshot, instead of the current Flush and snapshot, multi table and specified
familes are other things that can be added to the snapshot, and as far as I remember were
mentioned in the beginning of the development but skipped from the first step to get out with
the basic functionality)
                
> HBase Backup/Restore Based on HBase Snapshot and FileLink
> ---------------------------------------------------------
>
>                 Key: HBASE-7912
>                 URL: https://issues.apache.org/jira/browse/HBASE-7912
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>
> There have been attempts in the past to come up with a viable HBase backup/restore solution
(e.g., HBASE-4618).  Recently, there are many advancements and new features in HBase, for
example, FileLink, Snapshot, and Distributed Barrier Procedure. This is a proposal for a backup/restore
solution that utilizes these new features to achieve better performance and consistency. 
>  
> A common practice of backup and restore in database is to first take full baseline backup,
and then periodically take incremental backup that capture the changes since the full baseline
backup. HBase cluster can store massive amount data.  Combination of full backups with incremental
backups has tremendous benefit for HBase as well.  The following is a typical scenario for
full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes from the full
backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  Then the
incremental backups that are up to the desired point in time are applied on top of the full
backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of HFiles for fast
incremental restore.
> * Support single table or a set of tables, and column family level backup and restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption of incremental
backup schedule.
> * Support rollup/combining of incremental backups into longer period and bigger incremental
backups.
> * Unified command line interface for all the above.
> The solution will support HBase backup to FileSystem, either on the same cluster or across
clusters.  It has the flexibility to support backup to other devices and servers in the future.
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message