hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0
Date Wed, 08 Nov 2017 18:35:25 GMT
On 11/8/17 1:26 PM, Andrew Purtell wrote:
> I won't speak to the timing aspects of this, that's up to the RM, but the
> testing details look reasonable to me.

Understood and agree. Thanks for your input!

  With respect to chaos testing, the
> following goals would be good:
> 
> - Some backups and restores succeed even with masters and RSes going up and
> down. The resiliency can always be improved later, but we can't rely on no
> failures for entire duration of backup or restore operation to get a good
> result, especially for restore.

Yup! The expectation (if not explicitly stated) would be that we would 
work our way up to the ServerKilling monkey. The expectation is that 
this would be trivial to implement - IntegrationTestBase would wire it 
up for us.

> - Backups are not corrupted by failures. Or, corrupted (partial?) backups
> are identified and ignored and there are still good backups remaining which
> can be used for restore.
> 
> - When the verification tool says a backup and restore are good, they
> really are.

/me nods. Agreed.

I think we'll learn a bit about failure situations (doc intentionally 
avoided defining problems/solution) and the problems we see will help 
shape what the solutions we need to make are.

Mime
View raw message