hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?
Date Fri, 08 Aug 2008 20:06:39 GMT
Nigel Daley wrote:
> How will unit tests be divided?  For instance, will all three have to 
> have MiniDFSCluster and other shared test infrastructure?

The HDFS project can release an hdfs-test.jar file that contains 
MiniDFSCluster.  This will be used by mapred tests.  Similarly, mapred 
will release a mapred-test.jar that contains MiniMRCluster, which can be 
used by hdfs tests.  There is a circular dependency, but only in the 
test code, not in the mapred or hdfs code itself.  This is easy to 
enforce, since test code is not on the classpath when we compile 
non-test code.

> -1 until I better understand the benefit of making the split.

One benefit is that developers would spend less time reading messages 
about areas they're not interested in.  The core-dev mailing list 
traffic is becoming unmanageable.  Splitting these without splitting the 
project would mean that a split developer community would attempt to 
build a coherent product, which sounds dangerous.

Another benefit is that it would increase the separation of these 
technologies, so that, e.g., folks could more easily run different 
versions of mapreduce on top of different versions of HDFS.  Currently 
we make no such guarantees.  Folks would be able to upgrade to, e.g., 
the next release of mapreduce on a subset of their cluster without 
upgrading their HDFS.  That's not currently supported.  As we move 
towards splitting mapreduce into a scheduler and runtime, where folks 
can specify a different runtime per job, this will be even more critical.

We need to make this split eventually.  Why not now?


View raw message