On 22/10/10 01:10, Konstantin Boudnik wrote:
> On Thu, Oct 21, 2010 at 05:53PM, Ian Holsman wrote:
>> In discussing it with people, I've heard that a major issue (not the only
>> one i'm sure) is lack of resources to actually test the apache releases on
>> large clusters, and that it is very hard getting this done in short cycles
>> (hence the large gap between 20.x and 21).
>
> I do agree the lack of resources for testing Hadoop is a problem. However,
> there might be some slight difference in the meaning of word 'resources' ;)
>
> The only way, IMO, to have a reasonable testing done on a system as complex as
> Hadoop is to invest into automatic validation of builds at system level. This
> requires a few things (resources, if you will):
> - extra hardware (the easiest and cheapest problem)
> - automatic deployment, testing, and analysis
> - system tests development which able to control and observe a cluster
> behavior (in other words something more sophisticated than just shell
> scripts)
>
> And for the semi-adequate system testing you don't need a large cluster: 10-20
> nodes will be sufficient in most cases. But the automation of all the
> processes starting from deployment is the key. Testing automation is in a
> little better shape for Hadoop has that system test framework called Herriot
> (part of Hadoop code base for about 7 months now), but it still needs further
> extending.
>
+1 for testing, I would like to help with this, but my test stuff
depends on my lifecycle stuff which I need to sit down, sync up with
trunk and work out how to get in.
One thing you can do in a virtual world which you can't do in the
physical space is reconfigure the LAN on the fly, to see what happens.
For example, I could set up VLANs of two racks and a switch between
them, then partition the two and see what happens -while a simulated
external load (separate issue) hits the NN with the same amount of
traffic. Fun things.
|