Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 43B3C200D4C for ; Thu, 30 Nov 2017 21:08:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 42298160C01; Thu, 30 Nov 2017 20:08:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 89AEE160BF6 for ; Thu, 30 Nov 2017 21:08:04 +0100 (CET) Received: (qmail 95966 invoked by uid 500); 30 Nov 2017 20:08:03 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 95955 invoked by uid 99); 30 Nov 2017 20:08:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Nov 2017 20:08:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C831C1807B1 for ; Thu, 30 Nov 2017 20:08:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id o0Vn-GpBISot for ; Thu, 30 Nov 2017 20:08:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 711A25F6BE for ; Thu, 30 Nov 2017 20:08:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id EB189E256A for ; Thu, 30 Nov 2017 20:08:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 6627421064 for ; Thu, 30 Nov 2017 20:08:00 +0000 (UTC) Date: Thu, 30 Nov 2017 20:08:00 +0000 (UTC) From: "Vladimir Rodionov (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 30 Nov 2017 20:08:05 -0000 [ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273329#comment-16273329 ] Vladimir Rodionov commented on HBASE-17852: ------------------------------------------- The reason is the simplicity of the implementation. Is not this obvious? Should I have spent time trying to implement Tx management instead? I doubt. Did I answer original question? I thought that we are technical guys and we need technical answers. It seems that I was wrong. User intervention is required only if user kills backup process or it dies on a client side, for some other reason. All cluster side failures get repaired automatically. I see nothing painful for users here, [~mdrob], especially when I will implement auto-repair feature. This is the question actually, should we do repair automatically or we need to inform user, that there was abnormal failure of a last backup/merge/delete command and user need to run repair. Do not we still have *hbck* for this reason? Repair all the s**t which happens periodically in HBase cluster. Moving feature out of beta-1 only because someone does not like *attitude of a contributor* means that something is not going well in HBase community. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup) > ------------------------------------------------------------------------------------ > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task > Reporter: Vladimir Rodionov > Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, HBASE-17852-v9.patch > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup meta-table (backup system table). This procedure is lightweight because meta table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning up partial data in backup destination, followed by restoring backup meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), next time user will try create/merge/delete he(she) will see error message, that system is in inconsistent state and repair is required, he(she) will need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and BackupObserver's) we introduce small table ONLY to keep listing of bulk loaded files. All backup observers will work only with this new tables. The reason: in case of a failure during backup create/delete/merge/restore, when system performs automatic rollback, some data written by backup observers during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about consistency of this table, because bulk load is idempotent operation and can be repeated after failure. Partially written data in second table does not affect on BackupHFileCleaner plugin, because this data (list of bulk loaded files) correspond to a files which have not been loaded yet successfully and, hence - are not visible to the system -- This message was sent by Atlassian JIRA (v6.4.14#64029)