hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: [DISCUSSION] Items to purge from branch-2 before we cut hbase-2.0.0-beta1.
Date Thu, 02 Nov 2017 02:55:41 GMT

On 11/1/17 8:22 PM, Sean Busbey wrote:
> On Wed, Nov 1, 2017 at 7:08 PM, Vladimir Rodionov
> <vladrodionov@gmail.com> wrote:
>> There is no way to validate correctness of backup in a general case.
>> You can restore backup into temp table, but then what? Read rows one-by-one
>> from temp table and look them up
>> in a primary table? Won't work, because rows can be deleted or modified
>> since the last backup was done.
> This is why we have snapshots, no?

True, we could try to take a snapshot exactly when the backup was taken 
(likely, still difficult to coordinate on an active system), but in what 
reality would we actually want to do this? Most users I see are so 
concerned about the cost of running compactions (which are actually 
making performance better!), they wouldn't take non-negligible portion 
of their computing power and available space to re-instantiate their 
data (at least once) to make sure a copy worked correctly.

We have WALs, HFiles, and some metadata we'd export in a backup right? 
Why not intrinsically perform some validation that things like headers, 
trailers, etc still exist on the files we exported (e.g. open file, read 
header, seek to end, verify trailer, etc). I feel like that's a much 
more tenable solution that isn't going to have a ridiculous burden like 
restoring tables of modest and above size.

This smells like it's really asking to verify a distcp, than verifying 
backups. There is certainly something we can do to give a reasonable 
level of confidence that doesn't involve reconstituting the whole thing.

View raw message