hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject [DISCUSS] Plan for Distributed testing of Backup and Restore
Date Tue, 12 Sep 2017 16:07:02 GMT
On 9/11/17 11:52 PM, Stack wrote:
> On Mon, Sep 11, 2017 at 11:07 AM, Vladimir Rodionov <vladrodionov@gmail.com>
> wrote:
> 
>> ...
>> That is mostly it. Yes, We have not done real testing with real data on a
>> real cluster yet, except QA  testing on a small OpenStack
>> cluster (10 nodes). That is our probably the biggest minus right now. I
>> would like to inform community that this week we are going to start
>> full scale testing with reasonably sized data sets.
>>
> ... Completion of HA seems important as is result of the scale testing.
> 

I think we should knock out a rough sketch on what effective "scale" 
testing would look like since that is a very subjective phrase. Let me 
start the ball rolling with a few things that come to my mind.

(interpreting requirements as per rfc2119)

* MUST have >5 RegionServers and >1 Masters in play
* MUST have Non-trivial final data sizes (final data size would be >= 
100's of GB)
* MUST have some clear pass/fail determination for correctness of B&R
* MUST have some fault-injection

* SHOULD be a completely automated test, not require coordination of a 
human to executing commands.
* SHOULD be able to acquire operational insight (metrics) while 
performing operations to determine success of testing
* SHOULD NOT require manual intervention, e.g. working around known 
issues/limitations
* SHOULD reuse the IntegrationTest framework in hbase-it

Since we have a concern of correctness, ITBLL sounds like a good 
starting point to avoid having to re-write similar kinds of logic. 
ChaosMonkey is always great for fault-injection.

Thoughts?

Mime
View raw message