hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: MRUnit
Date Fri, 05 Mar 2010 01:23:43 GMT
Hi Chris,

The current development state is that I'm not pro-actively adding new code
to MRUnit, but am happy to address bugs people bring to my attention
(subject to the other demands on my work time). But I'm super-happy to help
you with pointers in patching issues you raise yourself. :)

These are all definitely legitimate issues in MRUnit that should be
addressed. At minimum, you should file issues on the Hadoop JIRA (
http://issues.apache.org/jira) to get these bugs logged. They should be made
under the MAPREDUCE project; tag them with the 'contrib/mrunit' component so
they're in the correct spot. I can try to add them to my work queue myself,
but they'll be addressed faster if you'd like to help contribute.

As you note, Hadoop 0.20 and the development "trunk" branch do diverge.
Apache Hadoop 0.20 does not contain an MRUnit implementation. The version
available in CDH is a backport of the trunk branch, with slight
modifications made so that it compiles against Hadoop 0.20 (you correctly
note a couple inconsistencies in Hadoop's API that the backport needs to
work around).

The correct way to address these bugs is to check out the trunk of Hadoop
MapReduce (see the Hadoop wiki for instructions on how to set your
development environment up, using either svn or git). Then make
modifications against the trunk branch and test them there, and generate a
patch that improves MRUnit in trunk. This way MRUnit work stays apace of the
rest of Hadoop's development. The changes should be committed to Hadoop
trunk (after posting the patch on the JIRA). Ideally you'd write a separate
patch (and have a separate JIRA filed) for each of the different issues
you've raised.

The CDH development team has an ongoing process of reviewing recent trunk
patches for backporting to the CDH build.  We'll then take a look at how to
best backport your patches so that they apply on top of CDH (likely there
won't be too much effort). Those would then be made available in a
subsequent CDH release. It's actually extremely likely that your changes
themselves wouldn't need specific effort to backport; in most cases
involving contribs, small bugfix patches written against trunk will apply
directly on top of CDH (or you may need to change just a line or two where a
TaskType or something of the like is involved).

Please ping me off-list and let me know if you've got further questions
about this, or whether you'd like some help writing bugfixes. I'm happy to
offer guidance as needed.

- Aaron Kimball

On Wed, Mar 3, 2010 at 7:53 PM, Chris White <chriswhite199@googlemail.com>wrote:

> What's the current development state of MRUnit? I'm currently using the
> 0.20.1+152 version from cloudera but the implementation lacks some important
> features (all relating to the new API)
>  * MapReduceDriver doesn't allow configuration of a combiner
>  * ReducerDriver doesn't allow you to configure the Reducer.Context such
> that the TaskAttemptID.isMapper() returns true/false (allowing you to test a
> Reducer class that relies on this to perform different functionality
> depending on the current phase of the execution chain
>  * Neither MapDriver, ReduceDriver and MapReduceDriver allow you to
> configure the Configuration object which is presented to the map/reducer
> through the mocked Context object
> Looking through the SVN tree, the current MRUnit code lives in the contrib
> folder at
> http://svn.apache.org/repos/asf/hadoop/mapreduce/trunk/src/contrib/mrunit,
> but the most recent revision (or even the base revision) won't compile with
> the 0.20.1 core (org.apache.hadoop.mapreduce.TaskType enum is not part of
> 0.20.1)
> If i wanted to go about making these changes where would be the best place
> to do it, bearing in mind that it would effectively be a branch of the
> 807942 revision, made to work for 0.20.1
> Thanks

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message