hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@apache.org>
Subject [DISCUSSION]: Future of Hadoop system testing
Date Fri, 08 Oct 2010 21:06:05 GMT
All,

I want to start a discussion about future approaches to perform Hadoop
system (and potentially other types of) testing in 0.22 and later.

As many of you know recent development effort from a number of Hadoop
developers brought to the existence new system test framework codename Herriot.
If you never hear about it please check HADOOP-6332 and
http://wiki.apache.org/hadoop/HowToUseSystemTestFramework

Now, Herriot is a great tool which allows for much wider and powerful
inspection and interventions of/into remote Hadoop's daemons (aka
observability andjcontrollability). There's a catch, however, for such powers
come at the costs of a build instrumentation. 

On the other hand, there's a fairly large number of cases where no
introspection into daemons' internals is required. These can be carried by
a simple communication via Hadoop CLI. To name a few: testing ACL refreshes,
basic file ops, etc.

However, there's a lack of any common understanding yet agreement on how this
might be performed. I'd like to start the conversation which will, hopefully,
let's us work out some tactics. 

I can see three possible approaches (might be more and I just don't see
them?):
  1) adding special mode to Herriot to work with non-instrumented clusters. In
  such a mode (let's call it 'standard' for now) the framework will have only
  reduced functionality such as:
    - start/stop a remote daemon
    - change/push a daemon configuration
    - simple(-ier) interfaces to HDFS via DFSClient
    - simple(-ier) interface to work with MR
    - (the list might be extended apparently)

  2) Groovy (or even bash) front-end for system tests. The latter is pretty
  poor, in my opinion, because unlike Groovy Unix shell won't provide
  abilities to work with public Hadoop (Java) APIs directly. Groovy, on the
  other hand, is much more expressive than Java; it's highly dynamic, and
  provides MOP among other things. (Please, let's not start a discussion about
  Groovy vs. Scala here!)

  3) Creating custom SSH-based command executors on top of CLITestHelper and
  then reusing the rest of that infrastructure to create tests similar to
  TestCLI. 

My ultimate goals is to, essentially, has a single uniformed test
driver/framework (such as JUnit) to control all/most types of tests execution
starting at the TUT (true unit tests end) up to the system and, potentially,
load tests.

One of the benefits such approach will provide is to facilitate integration of
other types of testing into CI infrastructure (read Hudson) and will provide
well-supported and familiar for many test development environment, lowering the
learning curve for potential contributors who might want to join Hadoop
community and helps us to make Hadoop even better product.

-- 
With best regards,
	Konstantin Boudnik (aka Cos)

A212 4206 7EC6 F8BF 20E6  7C37 32A5 E27E 4C03 A1A1
Attention! Streams of consciousness are disallowed

Cos' pubkey: http://people.apache.org/~cos/cos.asc

Mime
View raw message