hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matteo Bertozzi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot
Date Thu, 03 Apr 2014 00:56:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958402#comment-13958402
] 

Matteo Bertozzi commented on HBASE-7912:
----------------------------------------

{quote}The snapshot information and data is tightly coupled and stored with the existing HBase
cluster -- in-place backup. We want to backup HBase data to FileSystem cross clusters and
possible other storage media or servers.{quote}
How ExportSnapshot is not enabling this?

{quote}Full backup and restore. Backup will first invoke HBase snapshot and export snapshot
internally. 
The full backup can be restored with HBase bulk import utility.{quote}
What does "internally" means? 

{quote}Incremental backup uses WALs to capture the data changes since last full backup or
incremental 
backup. We execute roll log across region servers to track the WALs that need to be in the
backup. 
Then a distributed copy is used to move the physical files to target FileSystem.{quote}
When the logs are copied are also splitted to avoid to send tables that are not part of the
backup or is just a file copy?

Is a "Full backup" just a snapshot?

Where the backup manifests will be stored? I guess they must stay on the source cluster to
allow you to implement the "wal cleaner" to keep around logs for Incremental backup.

how do you decide for how log keep logs for a possible incoming Incremental Backup? 

how do you decide when is better doing an Incremental Backup as a full backup (e.g. Major
compaction happened) vs just keeping the WAL?

from the document looks like that you are triying to build a separate system that can produce
the same result of the current snapshot (aside of the "extensions"). I think you should aim
to "merge" the snapshot code inside of the Backup Manager & co, since as far as I understand
by doing a full backup you basically get the snapshot, and also you half rely on ExportSnapshot.

> HBase Backup/Restore Based on HBase Snapshot
> --------------------------------------------
>
>                 Key: HBASE-7912
>                 URL: https://issues.apache.org/jira/browse/HBASE-7912
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and would like
to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general solution to
common users. Our full backup is using snapshot to capture metadata locally and using exportsnapshot
to move data to another cluster; the incremental backup is using offline-WALplayer to backup
HLogs; we also leverage global distribution rolllog and flush to improve performance; other
added-on values such as convert, merge, progress report, and CLI commands. So that a common
user can backup hbase data without in-depth knowledge of hbase.  Our solution also contains
some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We plan to
use 10~12 subtasks to share each of the following features, and document the detail implement
in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-------------------------------------------------------------------------------------------------------------*
> *Below is the original description, to keep here as the history for the design and discussion
back in 2013*
> There have been attempts in the past to come up with a viable HBase backup/restore solution
(e.g., HBASE-4618).  Recently, there are many advancements and new features in HBase, for
example, FileLink, Snapshot, and Distributed Barrier Procedure. This is a proposal for a backup/restore
solution that utilizes these new features to achieve better performance and consistency. 
>  
> A common practice of backup and restore in database is to first take full baseline backup,
and then periodically take incremental backup that capture the changes since the full baseline
backup. HBase cluster can store massive amount data.  Combination of full backups with incremental
backups has tremendous benefit for HBase as well.  The following is a typical scenario for
full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes from the full
backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  Then the
incremental backups that are up to the desired point in time are applied on top of the full
backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of HFiles for fast
incremental restore.
> * Support single table or a set of tables, and column family level backup and restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption of incremental
backup schedule.
> * Support rollup/combining of incremental backups into longer period and bigger incremental
backups.
> * Unified command line interface for all the above.
> The solution will support HBase backup to FileSystem, either on the same cluster or across
clusters.  It has the flexibility to support backup to other devices and servers in the future.
 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message