Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3B523E955 for ; Fri, 22 Feb 2013 23:02:14 +0000 (UTC) Received: (qmail 14759 invoked by uid 500); 22 Feb 2013 23:02:13 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 14643 invoked by uid 500); 22 Feb 2013 23:02:13 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 14430 invoked by uid 99); 22 Feb 2013 23:02:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 23:02:13 +0000 Date: Fri, 22 Feb 2013 23:02:12 +0000 (UTC) From: "Richard Ding (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot and FileLink MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Richard Ding created HBASE-7912: ----------------------------------- Summary: HBase Backup/Restore Based on HBase Snapshot and FileLink Key: HBASE-7912 URL: https://issues.apache.org/jira/browse/HBASE-7912 Project: HBase Issue Type: New Feature Reporter: Richard Ding Assignee: Richard Ding There have been attempts in the past to come up with a viable HBase backup/restore solution (e.g., HBASE-4618). Recently, there are many advancements and new features in HBase, for example, FileLink, Snapshot, and Distributed Barrier Procedure. This is a proposal for a backup/restore solution that utilizes these new features to achieve better performance and consistency. A common practice of backup and restore in database is to first take full baseline backup, and then periodically take incremental backup that capture the changes since the full baseline backup. HBase cluster can store massive amount data. Combination of full backups with incremental backups has tremendous benefit for HBase as well. The following is a typical scenario for full and incremental backup. # The user takes a full backup of a table or a set of tables in HBase. # The user schedules periodical incremental backups to capture the changes from the full backup, or from last incremental backup. # The user needs to restore table data to a past point of time. # The full backup is restored to the table(s) or to different table name(s). Then the incremental backups that are up to the desired point in time are applied on top of the full backup. We would support the following key features and capabilities. * Full backup uses HBase snapshot to capture HFiles. * Use HBase WALs to capture incremental changes, but we use bulk load of HFiles for fast incremental restore. * Support single table or a set of tables, and column family level backup and restore. * Restore to different table names. * Support adding additional tables or CF to backup set without interruption of incremental backup schedule. * Support rollup/combining of incremental backups into longer period and bigger incremental backups. * Unified command line interface for all the above. The solution will support HBase backup to FileSystem, either on the same cluster or across clusters. It has the flexibility to support backup to other devices and servers in the future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira