hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: unit tests in hadoop core
Date Fri, 27 Jul 2012 21:38:12 GMT
HBase did something similar, so linking that here if it helps you
Andrey: https://issues.apache.org/jira/browse/HBASE-4602

On Sat, Jul 28, 2012 at 3:00 AM, Andrey Klochkov
<aklochkov@griddynamics.com> wrote:
> Hello,
>
> It's quite noticeable that testing hadoop-hdfs and hadoop-mapreduce
> (0.23/1.0/2.0) takes a lot of time which has number of obvious
> downsides. Me and my team are trying to analyze the reasons and
> identify possible improvements, and in particular we noticed that
> during last years there were a number of attempts to optimize and
> speed up HDFS/MR junit tests, namely:
>
> 1. Introducing unit test framework
>
> A number of pure unit tests (mock-based, non-integration) were added,
> see HDFS-669, MAPREDUCE-1050, HADOOP-6423.
>
> However, it seems that these tests are not separated from integration
> tests (MiniCluster-based), some of them were moved to the
> hadoop-hdfs/src/tests/unit and hadoop-mapreduce-project/src/test/unit
> directories and disabled in mavenized builds starting from 0.23. There
> was an attempt to fix this in HDFS-2276, but it's still unresolved.
>
> 2. Smoke tests (10 minutes test target)
>
> There was a successful initiative on selecting a subset of tests in
> HDFS and MapReduce modules to be used as smoke tests with running time
> < 10 minutes. The tests were chosen manually, with the condition of
> having large code coverage in the most important packages/classes.
> This was done prior to 0.23/2.0, in Ant builds, see HADOOP-5628,
> HDFS-458, MAPREDUCE-670.
>
> Apparently, mavenized builds do not use this feature.
>
> 3. Separating tests into categories. HADOOP-6399 - open since 2009.
>
> In general, separating tests into categories, having fast true unit
> tests additionally to great coverage by integration/component tests
> Hadoop has now, and then sets of capacity/availability tests -- those
> things would help to make Hadoop more stable, development and release
> process less painful etc.
>
> So would it be useful to do some cleaning, stabilizing and enhancing
> existing unit/integration tests, assemble a suite of pure unit tests
> and short-running integration tests, having coverage measured for all
> three sets (unit, smoke, full). Is it worth pursuing this? What's the
> best place to start? Is it worth completing the items 1 and 2
> mentioned above? Any comments or hints would be really appreciated.
>
> --
> Andrey Klochkov



-- 
Harsh J

Mime
View raw message