hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Klochkov <akloch...@griddynamics.com>
Subject unit tests in hadoop core
Date Fri, 27 Jul 2012 21:30:03 GMT

It's quite noticeable that testing hadoop-hdfs and hadoop-mapreduce
(0.23/1.0/2.0) takes a lot of time which has number of obvious
downsides. Me and my team are trying to analyze the reasons and
identify possible improvements, and in particular we noticed that
during last years there were a number of attempts to optimize and
speed up HDFS/MR junit tests, namely:

1. Introducing unit test framework

A number of pure unit tests (mock-based, non-integration) were added,
see HDFS-669, MAPREDUCE-1050, HADOOP-6423.

However, it seems that these tests are not separated from integration
tests (MiniCluster-based), some of them were moved to the
hadoop-hdfs/src/tests/unit and hadoop-mapreduce-project/src/test/unit
directories and disabled in mavenized builds starting from 0.23. There
was an attempt to fix this in HDFS-2276, but it's still unresolved.

2. Smoke tests (10 minutes test target)

There was a successful initiative on selecting a subset of tests in
HDFS and MapReduce modules to be used as smoke tests with running time
< 10 minutes. The tests were chosen manually, with the condition of
having large code coverage in the most important packages/classes.
This was done prior to 0.23/2.0, in Ant builds, see HADOOP-5628,

Apparently, mavenized builds do not use this feature.

3. Separating tests into categories. HADOOP-6399 - open since 2009.

In general, separating tests into categories, having fast true unit
tests additionally to great coverage by integration/component tests
Hadoop has now, and then sets of capacity/availability tests -- those
things would help to make Hadoop more stable, development and release
process less painful etc.

So would it be useful to do some cleaning, stabilizing and enhancing
existing unit/integration tests, assemble a suite of pure unit tests
and short-running integration tests, having coverage measured for all
three sets (unit, smoke, full). Is it worth pursuing this? What's the
best place to start? Is it worth completing the items 1 and 2
mentioned above? Any comments or hints would be really appreciated.

Andrey Klochkov

View raw message