Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E1CE118FEC for ; Thu, 17 Mar 2016 13:16:38 +0000 (UTC) Received: (qmail 78909 invoked by uid 500); 17 Mar 2016 13:16:33 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 78829 invoked by uid 500); 17 Mar 2016 13:16:33 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 78789 invoked by uid 99); 17 Mar 2016 13:16:33 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Mar 2016 13:16:33 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 849BA2C1F62 for ; Thu, 17 Mar 2016 13:16:33 +0000 (UTC) Date: Thu, 17 Mar 2016 13:16:33 +0000 (UTC) From: "Jianwei Cui (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-15469) Take snapshot by family MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15469: -------------------------------- Attachment: HBASE-15469-v1.patch > Take snapshot by family > ----------------------- > > Key: HBASE-15469 > URL: https://issues.apache.org/jira/browse/HBASE-15469 > Project: HBase > Issue Type: Improvement > Components: snapshots > Affects Versions: 2.0.0 > Reporter: Jianwei Cui > Attachments: HBASE-15469-v1.patch > > > In our production environment, there are some 'wide' tables in offline cluster. The 'wide' table has a number of families, different applications will access different families of the table through MapReduce. When some application starting to provide online service, we need to copy needed families from offline cluster to online cluster. For future write, the inter-cluster replication supports setting families for table, we can use it to copy future edits for needed families. For existed data, we can take snapshot of the table on offline cluster, then exploit {{ExportSnapshot}} to copy snapshot to online cluster and clone the snapshot. However, we can only take snapshot for the whole table in which many families are not needed for the application, this will lead unnecessary data copy. I think it is useful to support taking snapshot by family, so that we can only copy needed data. > Possible solution to support such function: > 1. Add family names field to the protobuf definition of {{SnapshotDescription}} > 2. Allow to set families when taking snapshot in hbase shell, such as: > {code} > snapshot 'tableName', 'snapshotName', 'FamilyA', 'FamilyB', {SKIP_FLUSH => true} > {code} > 3. Add family names to {{SnapshotDescription}} in client side > 4. Read family names from {{SnapshotDescription}} in Master/Regionserver, keep only requested families when taking snapshot for region. > Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)