Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5800C18407 for ; Thu, 17 Mar 2016 18:36:39 +0000 (UTC) Received: (qmail 86101 invoked by uid 500); 17 Mar 2016 18:36:33 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 85970 invoked by uid 500); 17 Mar 2016 18:36:33 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 85765 invoked by uid 99); 17 Mar 2016 18:36:33 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Mar 2016 18:36:33 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 835D02C1F56 for ; Thu, 17 Mar 2016 18:36:33 +0000 (UTC) Date: Thu, 17 Mar 2016 18:36:33 +0000 (UTC) From: "Vladimir Rodionov (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-14142) HBase Backup/Restore Phase 3: Cells deduplication during backup MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200128#comment-15200128 ] Vladimir Rodionov commented on HBASE-14142: ------------------------------------------- Moved to Phase 3. > HBase Backup/Restore Phase 3: Cells deduplication during backup > --------------------------------------------------------------- > > Key: HBASE-14142 > URL: https://issues.apache.org/jira/browse/HBASE-14142 > Project: HBase > Issue Type: New Feature > Reporter: Vladimir Rodionov > Assignee: Vladimir Rodionov > > As since we do not record last backed up sequence ids (MVCC) and do not restore up to that sequence id - that is kind of tricky, there will be some duplicates of KVs in store files after first incremental restore after full backup. These duplicates are result of how we do full backup and first incremental backup after full one. During full backup we perform distributed log roll and record, for every RS, last WAL timestamp, then we do snapshot. The next WAL after recorded one will make it into a next incremental backup set, but it will contains some edits (puts, deletes) which have been recorded by a previous snapshot. During restore, we, first, restore snapshot, then we will re-play WALs and this operation can create some duplicates of KVs in different store files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)