hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Testing Mappers in Hadoop 0.20.0
Date Thu, 23 Jul 2009 20:38:31 GMT
I hacked around this in MRUnit last night. MRUnit now has support for
the new API -- See MAPREDUCE-800.

You can, in fact, subclass Mapper.Context and Reducer.Context, since
they don't actually share any state with the outer class
Mapper/Reducer implementation, just the type signatures. But doing
this requires making a dummy "Mapper" subclass to wrap around your own
Context subclass, which is somewhat unsightly. C'est la vie.

- Aaron

On Wed, Jul 22, 2009 at 10:44 PM, David Hall<dlwh@cs.stanford.edu> wrote:
> For what it's worth, we ended up solving this problem (today) by using
> EasyMock with ClassExtension. It's an awful lot of magic, but it seems
> to work just fine for our purposes.  It would be great if doing
> bytecode weaving under the hood weren't necessary just to write test
> code, though.
>
> -- David
>
> On Wed, Jul 22, 2009 at 8:21 PM, Aaron Kimball<aaron@cloudera.com> wrote:
>> er, +CC mapreduce-dev...
>> - A
>>
>> On Wed, Jul 22, 2009 at 8:17 PM, Aaron Kimball<aaron@cloudera.com> wrote:
>>> +CC mapred-dev
>>>
>>> Hm.. Making this change is actually really difficult.
>>>
>>> After changing Mapper.java, I understand why this was made a
>>> non-static member.  By making Context non-static, it can inherit from
>>> MapContext<W, X, Y, Z> and bind to the type qualifiers already
>>> specified in the class definition. So you can't make Context static
>>> because it needs to inherit the type parameters from the Mapper class.
>>> (Since Mapper has type qualifiers, it actually can't have static inner
>>> classes anyway.) But why even have Context in there anyway, given that
>>> it does nothing beyond MapContext's implementation?
>>>
>>> We can still change the map method's signature to use MapContext.
>>>
>>> As it is, you have to write:
>>> class MyMapper extends Mapper<W, X, Y, Z> {
>>>  void map(W k, X v, Context c) {
>>>     ....
>>>  }
>>> }
>>>
>>> To make the change, you would now need to write:
>>>
>>> class MyMapper extends Mapper<W, X, Y, Z> {
>>>  void map(W k, X v, MapContext<W,X,Y,Z> c) {
>>>     ....
>>>  }
>>> }
>>>
>>> So I think the primary reason for the non-static inner member is to
>>> save you some typing. That having been said, it makes adding mock
>>> implementations of Context really difficult.
>>>
>>> The real reason this is a tricky change, though, is that this is
>>> (surprisingly, to me) an incompatible change -- even though Context
>>> subclasses MapContext, so it's a type-safe API change to widen the set
>>> of inputs we accept in map(), anyone who specified @Override on the
>>> method and uses Context as an argument will get a compiler error :\
>>> (@Override, it seems, uses the literal names even if there's a
>>> type-safe promotion to a method in a superclass.)
>>>
>>> Can we still make incompatible API changes on the new API, or is it
>>> officially frozen? If incompatible changes are allowed, I'd like to
>>> see this in. I think that in the interest of better
>>> mockability/extensibility, it'd be cleaner to ditch the inner Context
>>> class in favor of explicit use of MapContext. We know Java's type
>>> system is verbose, but that doesn't mean we should try to hack around
>>> it, if that means losing functionality. (I think I know how to get
>>> around this in MRUnit, but it'd be cleaner to not have to.)
>>>
>>> - Aaron
>>>
>>>
>>>
>>> On Wed, Jul 22, 2009 at 7:22 PM, Aaron Kimball<aaron@cloudera.com> wrote:
>>>> Both of those are good points. I'll submit a patch.
>>>> - Aaron
>>>>
>>>> On Wed, Jul 22, 2009 at 6:24 PM, Ted Dunning<ted.dunning@gmail.com>
wrote:
>>>>> To amplify David's point, why is the argument a Mapper.Context rather
than
>>>>> MapContext?
>>>>>
>>>>> Also, why is the Mapper.Context not static?
>>>>>
>>>>> On Wed, Jul 22, 2009 at 5:29 PM, David Hall <dlwh@cs.stanford.edu>
wrote:
>>>>>
>>>>>> This is nice, but doesn't it suffer from the same problem? MRUnit
uses
>>>>>> the mapred API, which is deprecated, and the new API doesn't use
>>>>>> OutputCollector, but a non-static inner class.
>>>>>>
>>>>>> -- David
>>>>>>
>>>>>> On Wed, Jul 22, 2009 at 4:52 PM, Aaron Kimball<aaron@cloudera.com>
wrote:
>>>>>> > Hi David,
>>>>>> >
>>>>>> > I wrote a contrib module called MRUnit
>>>>>> > (http://issues.apache.org/jira/browse/hadoop-5518) designed
to allow
>>>>>> > unit tests for mappers/reducers more easily. It's slated for
inclusion
>>>>>> > in 0.21, not 0.20 unfortunately, but you can download the patch
above
>>>>>> > as well as MAPREDUCE-680 and build it against any earlier version
of
>>>>>> > Hadoop. Unfortunately, it doesn't currently support the new
APIs
>>>>>> > (e.g., with Context objects), but I imagine this could be added
with
>>>>>> > little difficulty. I just haven't had time to do it myself ;)
If you'd
>>>>>> > like to take a stab at it, I'd love some help!
>>>>>> >
>>>>>> > More info is at www.cloudera.com/hadoop-mrunit
>>>>>> > Cheers,
>>>>>> > - Aaron
>>>>>> >
>>>>>> > On Wed, Jul 22, 2009 at 2:49 PM, David Hall<dlwh@cs.stanford.edu>
wrote:
>>>>>> >> Hi,
>>>>>> >>
>>>>>> >> I'm a student working with Apache Mahout for the Google
Summer of
>>>>>> >> Code. We recently moved to 0.20.0, and I was porting my
code to the
>>>>>> >> new API. Unfortunately, I (and the whole project team) seem
to have
>>>>>> >> run into a problem when it comes to testing them.
>>>>>> >>
>>>>>> >> Historically, we would create a Mapper in a unit test, and
a special
>>>>>> >> "DummyOutputCollector", which was essentially a multimap
dressed up to
>>>>>> >> conform to OutputCollector. In Hadoop 0.20.0, this isn't
possible
>>>>>> >> anymore, because Mappers take an instance of an inner class.
>>>>>> >>
>>>>>> >> It's of course possible to dress up the Context in something
else
>>>>>> >> (say, something just like an OutputCollector), and to specify
that
>>>>>> >> Mahout Mappers should just delegate to a method that takes
an
>>>>>> >> OutputCollector. But, this seems to not be very idiomatic.
>>>>>> >>
>>>>>> >> All this goes to say, what would be a "best practice" for
testing
>>>>>> >> Mappers and Reducers in 0.20.0?
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> David Hall
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ted Dunning, CTO
>>>>> DeepDyve
>>>>>
>>>>
>>>
>>
>

Mime
View raw message