Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Subject: Re: [DISCUSSION] Items to purge from branch-2 before we cut
 hbase-2.0.0-beta1.
To: dev@hbase.apache.org
References: <CADcMMgFkPnZZ2vJsDqgJAV4VRdeqS=HSBmT7=evgz2ZY1_E_Ng@mail.gmail.com>
 <855dbd28-bafe-4880-9f0d-22b208ef0bd4@apache.org>
 <CADcMMgGESCh7XLhm8qBsFiu32iDCKJBWdoYoHXWQ5A-DBYuCYA@mail.gmail.com>
 <CAAg3a2p5q=-7C=+aK3cpG=1+S=JFHZkGV8W-_gSXLaasnJUxsg@mail.gmail.com>
 <CAAg3a2obou04nF1ySa0CUj657geGA7n891ekyfNHtE7h41Lf2A@mail.gmail.com>
 <CADcMMgGCqr7hoioBV3waEr3SO_sBhEbhg9aMR3kY_HH_arx=Jg@mail.gmail.com>
 <6da8f37f-2f7a-186a-0216-b55d71f71d9b@apache.org>
 <CAAg3a2pkrKBEXyvtCNpxQVzcXZXt+xAReeieofTdWJ8nkgAyCQ@mail.gmail.com>
 <CAAjhxrpWD1eoFLWXV4_gpUvR0n3f6_qDGovx0D0=N3SU6r16UA@mail.gmail.com>
 <CAAjhxrotkgn=qprcprSM3_Ag+FxbqjTMYGWRTQ7bBvjDzDwN6A@mail.gmail.com>
 <CAAg3a2oGr8PLrAmBPnKTgEx3oKnaqZ+-tv2aq9hoqz+xCQgH2Q@mail.gmail.com>
 <CAN5cbe7cEFRp3KodX7OZQAU88433sJOPDAePFn7CaMsLUyiujQ@mail.gmail.com>
 <58af41b6-ba68-68f0-c253-95a66f9b2089@apache.org>
 <7E226BBD-5EF8-450A-8214-4C3A51A8E030@gmail.com>
From: Josh Elser <elserj@apache.org>
Message-ID: <1ba3223a-dcbb-2874-c7ec-41619c553834@apache.org>
Date: Thu, 2 Nov 2017 08:47:03 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0)
 Gecko/20100101 Thunderbird/56.0
MIME-Version: 1.0
In-Reply-To: <7E226BBD-5EF8-450A-8214-4C3A51A8E030@gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
archived-at: Thu, 02 Nov 2017 12:47:11 -0000

That's quite a good argument :) -- there's a difference between 
occasional verification for building confidence and full data 
verification for every, single, backup (which is how HBASE-19106 read to 
me). I still think the latter (thus, 19106, verbatim in its ask) would 
be unwieldy; however, the ability to do it ad-hoc as you describe has 
benefits.

Also makes me wonder what how reusable VerifyReplication is at its core 
(I mean, it's more or less the same thing under the hood, right?).

Let's continue to hash out what we think the scope of a data 
verification "feature" should be and then get that put up on 19106. This 
is good.

On 11/1/17 11:32 PM, Andrew Purtell wrote:
> Potential adopters will absolutely want to construct for themselves a verifiable live exercise. Tooling that lets you do that against a snapshot would be the way to go, I think. Once you do that exercise, probably a few times, you can trust the backup solution enough for restore into production, where verification may or may not be possible.
> 
> A user who claims they'd rather not verify their backup solution works on account of performance concerns shouldn't be taken seriously. (Not that you would (smile))
> 
> 
>> On Nov 1, 2017, at 7:55 PM, Josh Elser <elserj@apache.org> wrote:
>>
>>
>>
>>> On 11/1/17 8:22 PM, Sean Busbey wrote:
>>> On Wed, Nov 1, 2017 at 7:08 PM, Vladimir Rodionov
>>> <vladrodionov@gmail.com> wrote:
>>>> There is no way to validate correctness of backup in a general case.
>>>>
>>>> You can restore backup into temp table, but then what? Read rows one-by-one
>>>> from temp table and look them up
>>>> in a primary table? Won't work, because rows can be deleted or modified
>>>> since the last backup was done.
>>>>
>>> This is why we have snapshots, no?
>>
>> True, we could try to take a snapshot exactly when the backup was taken (likely, still difficult to coordinate on an active system), but in what reality would we actually want to do this? Most users I see are so concerned about the cost of running compactions (which are actually making performance better!), they wouldn't take non-negligible portion of their computing power and available space to re-instantiate their data (at least once) to make sure a copy worked correctly.
>>
>> We have WALs, HFiles, and some metadata we'd export in a backup right? Why not intrinsically perform some validation that things like headers, trailers, etc still exist on the files we exported (e.g. open file, read header, seek to end, verify trailer, etc). I feel like that's a much more tenable solution that isn't going to have a ridiculous burden like restoring tables of modest and above size.
>>
>> This smells like it's really asking to verify a distcp, than verifying backups. There is certainly something we can do to give a reasonable level of confidence that doesn't involve reconstituting the whole thing.