Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 480AB200D2B for ; Thu, 2 Nov 2017 13:47:11 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 44113160BE5; Thu, 2 Nov 2017 12:47:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 89FD51609EE for ; Thu, 2 Nov 2017 13:47:10 +0100 (CET) Received: (qmail 260 invoked by uid 500); 2 Nov 2017 12:47:04 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 249 invoked by uid 99); 2 Nov 2017 12:47:04 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Nov 2017 12:47:04 +0000 Received: from hw10447.local (pool-173-64-82-179.bltmmd.fios.verizon.net [173.64.82.179]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 29C421A02CD for ; Thu, 2 Nov 2017 12:47:04 +0000 (UTC) Subject: Re: [DISCUSSION] Items to purge from branch-2 before we cut hbase-2.0.0-beta1. To: dev@hbase.apache.org References: <855dbd28-bafe-4880-9f0d-22b208ef0bd4@apache.org> <6da8f37f-2f7a-186a-0216-b55d71f71d9b@apache.org> <58af41b6-ba68-68f0-c253-95a66f9b2089@apache.org> <7E226BBD-5EF8-450A-8214-4C3A51A8E030@gmail.com> From: Josh Elser Message-ID: <1ba3223a-dcbb-2874-c7ec-41619c553834@apache.org> Date: Thu, 2 Nov 2017 08:47:03 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0) Gecko/20100101 Thunderbird/56.0 MIME-Version: 1.0 In-Reply-To: <7E226BBD-5EF8-450A-8214-4C3A51A8E030@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit archived-at: Thu, 02 Nov 2017 12:47:11 -0000 That's quite a good argument :) -- there's a difference between occasional verification for building confidence and full data verification for every, single, backup (which is how HBASE-19106 read to me). I still think the latter (thus, 19106, verbatim in its ask) would be unwieldy; however, the ability to do it ad-hoc as you describe has benefits. Also makes me wonder what how reusable VerifyReplication is at its core (I mean, it's more or less the same thing under the hood, right?). Let's continue to hash out what we think the scope of a data verification "feature" should be and then get that put up on 19106. This is good. On 11/1/17 11:32 PM, Andrew Purtell wrote: > Potential adopters will absolutely want to construct for themselves a verifiable live exercise. Tooling that lets you do that against a snapshot would be the way to go, I think. Once you do that exercise, probably a few times, you can trust the backup solution enough for restore into production, where verification may or may not be possible. > > A user who claims they'd rather not verify their backup solution works on account of performance concerns shouldn't be taken seriously. (Not that you would (smile)) > > >> On Nov 1, 2017, at 7:55 PM, Josh Elser wrote: >> >> >> >>> On 11/1/17 8:22 PM, Sean Busbey wrote: >>> On Wed, Nov 1, 2017 at 7:08 PM, Vladimir Rodionov >>> wrote: >>>> There is no way to validate correctness of backup in a general case. >>>> >>>> You can restore backup into temp table, but then what? Read rows one-by-one >>>> from temp table and look them up >>>> in a primary table? Won't work, because rows can be deleted or modified >>>> since the last backup was done. >>>> >>> This is why we have snapshots, no? >> >> True, we could try to take a snapshot exactly when the backup was taken (likely, still difficult to coordinate on an active system), but in what reality would we actually want to do this? Most users I see are so concerned about the cost of running compactions (which are actually making performance better!), they wouldn't take non-negligible portion of their computing power and available space to re-instantiate their data (at least once) to make sure a copy worked correctly. >> >> We have WALs, HFiles, and some metadata we'd export in a backup right? Why not intrinsically perform some validation that things like headers, trailers, etc still exist on the files we exported (e.g. open file, read header, seek to end, verify trailer, etc). I feel like that's a much more tenable solution that isn't going to have a ridiculous burden like restoring tables of modest and above size. >> >> This smells like it's really asking to verify a distcp, than verifying backups. There is certainly something we can do to give a reasonable level of confidence that doesn't involve reconstituting the whole thing.