hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: [DISCUSS] Plan for Distributed testing of Backup and Restore
Date Tue, 12 Sep 2017 16:53:08 GMT
Thanks for the quick feedback!

On 9/12/17 12:36 PM, Stack wrote:
> On Tue, Sep 12, 2017 at 9:33 AM, Andrew Purtell <andrew.purtell@gmail.com>
> wrote:
>> I think those are reasonable criteria Josh.
>> What I would like to see is something like "we ran ITBLL (or custom
>> generator with similar correctness validation if you prefer) on a dev
>> cluster (5-10 nodes) for 24 hours with server killing chaos agents active,
>> attempted 1,440 backups (one per minute), of which 1,000 succeeded and 100%
>> if these were successfully restored and validated." This implies your
>> points on automation and no manual intervention. Maybe the number of
>> successful backups under challenging conditions will be lower. Point is
>> they demonstrate we can rely on it even when a cluster is partially
>> unhealthy, which in production is often the normal order of affairs.

I like it. I hadn't thought about stressing quite this aggressively, but 
now that I think about it, sounds like a great plan. Having some 
ballpark measure to quantify the cost of a "backup-heavy" workload would 
be cool in addition to seeing how the system reacts in unexpected manners.

> Sounds good to me.
> How will you test the restore aspect? After 1k (or whatever makes sense)
> incremental backups over the life of the chaos, could you restore and
> validate that the table had all expected data in place.

Exactly. My thinking was that, at any point, we should be able to do a 
restore and validate. Maybe something like: every Nth ITBLL iteration, 
make a new backup point, restore a previous backup point, verify, 
restore to newest backup point. The previous backup point should be a 
full or incremental point.

Vlad: I'm obviously curious to see what you think about this stuff, in 
addition to what you already had in mind :)

View raw message