hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: bringing the codebases back in line
Date Fri, 22 Oct 2010 11:40:27 GMT
On 22/10/10 01:10, Konstantin Boudnik wrote:
> On Thu, Oct 21, 2010 at 05:53PM, Ian Holsman wrote:
>> In discussing it with people, I've heard that a major issue (not the only
>> one i'm sure) is lack of resources to actually test the apache releases on
>> large clusters, and that it is very hard getting this done in short cycles
>> (hence the large gap between 20.x and 21).
>
> I do agree the lack of resources for testing Hadoop is a problem. However,
> there might be some slight difference in the meaning of word 'resources' ;)
>
> The only way, IMO, to have a reasonable testing done on a system as complex as
> Hadoop is to invest into automatic validation of builds at system level. This
> requires a few things (resources, if you will):
>    - extra hardware (the easiest and cheapest problem)
>    - automatic deployment, testing, and analysis
>    - system tests development which able to control and observe a cluster
>      behavior (in other words something more sophisticated than just shell
>      scripts)
>
> And for the semi-adequate system testing you don't need a large cluster: 10-20
> nodes will be sufficient in most cases. But the automation of all the
> processes starting from deployment is the key. Testing automation is in a
> little better shape for Hadoop has that system test framework called Herriot
> (part of Hadoop code base for about 7 months now), but it still needs further
> extending.
>

+1 for testing, I would like to help with this, but my test stuff 
depends on my lifecycle stuff which I need to sit down, sync up with 
trunk and work out how to get in.

One thing you can do in a virtual world which you can't do in the 
physical space is reconfigure the LAN on the fly, to see what happens. 
For example, I could set up VLANs of two racks and a switch between 
them, then partition the two  and see what happens -while a simulated 
external load (separate issue) hits the NN with the same amount of 
traffic. Fun things.

Mime
View raw message