Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B053B10C3B for ; Thu, 12 Feb 2015 20:38:34 +0000 (UTC) Received: (qmail 18345 invoked by uid 500); 12 Feb 2015 20:38:12 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 18293 invoked by uid 500); 12 Feb 2015 20:38:12 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 18281 invoked by uid 99); 12 Feb 2015 20:38:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Feb 2015 20:38:12 +0000 Date: Thu, 12 Feb 2015 20:38:12 +0000 (UTC) From: "Jesse Yates (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13031) Ability to snapshot based on a key range MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318944#comment-14318944 ] Jesse Yates commented on HBASE-13031: ------------------------------------- +1 seems like a reasonable approach. > Ability to snapshot based on a key range > ---------------------------------------- > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Brainstorming > Affects Versions: 2.0.0, 0.94.26, 1.1.0, 0.98.11 > Reporter: churro morales > Assignee: churro morales > Priority: Critical > > Posted on the mailing list and seems like some people are interested. A little background for everyone. > We have a very large table, we would like to snapshot and transfer the data to another cluster (compressed data is always better to ship). Our problem lies in the fact it could take many weeks to transfer all of the data and during that time with major compactions, the data stored in dfs has the potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key range. > Ideally I feel the approach is that the user would specify a start and stop key, those would be associated with a region boundary. If between the time the user submits the request and the snapshot is taken the boundaries change (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the request was submitted and the regions locked, the snapshot could simply fail and the user would try again, instead of potentially giving the user more / less than what they had anticipated. I was planning on storing the start / stop key in the SnapshotDescription and from there it looks pretty straight forward where we just have to change the verifier code to accommodate the key ranges. > If this design sounds good to anyone, or if I am overlooking anything please let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)