hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Testing Mappers in Hadoop 0.20.0
Date Thu, 23 Jul 2009 03:21:21 GMT
er, +CC mapreduce-dev...
- A

On Wed, Jul 22, 2009 at 8:17 PM, Aaron Kimball<aaron@cloudera.com> wrote:
> +CC mapred-dev
>
> Hm.. Making this change is actually really difficult.
>
> After changing Mapper.java, I understand why this was made a
> non-static member.  By making Context non-static, it can inherit from
> MapContext<W, X, Y, Z> and bind to the type qualifiers already
> specified in the class definition. So you can't make Context static
> because it needs to inherit the type parameters from the Mapper class.
> (Since Mapper has type qualifiers, it actually can't have static inner
> classes anyway.) But why even have Context in there anyway, given that
> it does nothing beyond MapContext's implementation?
>
> We can still change the map method's signature to use MapContext.
>
> As it is, you have to write:
> class MyMapper extends Mapper<W, X, Y, Z> {
>  void map(W k, X v, Context c) {
>     ....
>  }
> }
>
> To make the change, you would now need to write:
>
> class MyMapper extends Mapper<W, X, Y, Z> {
>  void map(W k, X v, MapContext<W,X,Y,Z> c) {
>     ....
>  }
> }
>
> So I think the primary reason for the non-static inner member is to
> save you some typing. That having been said, it makes adding mock
> implementations of Context really difficult.
>
> The real reason this is a tricky change, though, is that this is
> (surprisingly, to me) an incompatible change -- even though Context
> subclasses MapContext, so it's a type-safe API change to widen the set
> of inputs we accept in map(), anyone who specified @Override on the
> method and uses Context as an argument will get a compiler error :\
> (@Override, it seems, uses the literal names even if there's a
> type-safe promotion to a method in a superclass.)
>
> Can we still make incompatible API changes on the new API, or is it
> officially frozen? If incompatible changes are allowed, I'd like to
> see this in. I think that in the interest of better
> mockability/extensibility, it'd be cleaner to ditch the inner Context
> class in favor of explicit use of MapContext. We know Java's type
> system is verbose, but that doesn't mean we should try to hack around
> it, if that means losing functionality. (I think I know how to get
> around this in MRUnit, but it'd be cleaner to not have to.)
>
> - Aaron
>
>
>
> On Wed, Jul 22, 2009 at 7:22 PM, Aaron Kimball<aaron@cloudera.com> wrote:
>> Both of those are good points. I'll submit a patch.
>> - Aaron
>>
>> On Wed, Jul 22, 2009 at 6:24 PM, Ted Dunning<ted.dunning@gmail.com> wrote:
>>> To amplify David's point, why is the argument a Mapper.Context rather than
>>> MapContext?
>>>
>>> Also, why is the Mapper.Context not static?
>>>
>>> On Wed, Jul 22, 2009 at 5:29 PM, David Hall <dlwh@cs.stanford.edu> wrote:
>>>
>>>> This is nice, but doesn't it suffer from the same problem? MRUnit uses
>>>> the mapred API, which is deprecated, and the new API doesn't use
>>>> OutputCollector, but a non-static inner class.
>>>>
>>>> -- David
>>>>
>>>> On Wed, Jul 22, 2009 at 4:52 PM, Aaron Kimball<aaron@cloudera.com>
wrote:
>>>> > Hi David,
>>>> >
>>>> > I wrote a contrib module called MRUnit
>>>> > (http://issues.apache.org/jira/browse/hadoop-5518) designed to allow
>>>> > unit tests for mappers/reducers more easily. It's slated for inclusion
>>>> > in 0.21, not 0.20 unfortunately, but you can download the patch above
>>>> > as well as MAPREDUCE-680 and build it against any earlier version of
>>>> > Hadoop. Unfortunately, it doesn't currently support the new APIs
>>>> > (e.g., with Context objects), but I imagine this could be added with
>>>> > little difficulty. I just haven't had time to do it myself ;) If you'd
>>>> > like to take a stab at it, I'd love some help!
>>>> >
>>>> > More info is at www.cloudera.com/hadoop-mrunit
>>>> > Cheers,
>>>> > - Aaron
>>>> >
>>>> > On Wed, Jul 22, 2009 at 2:49 PM, David Hall<dlwh@cs.stanford.edu>
wrote:
>>>> >> Hi,
>>>> >>
>>>> >> I'm a student working with Apache Mahout for the Google Summer of
>>>> >> Code. We recently moved to 0.20.0, and I was porting my code to
the
>>>> >> new API. Unfortunately, I (and the whole project team) seem to have
>>>> >> run into a problem when it comes to testing them.
>>>> >>
>>>> >> Historically, we would create a Mapper in a unit test, and a special
>>>> >> "DummyOutputCollector", which was essentially a multimap dressed
up to
>>>> >> conform to OutputCollector. In Hadoop 0.20.0, this isn't possible
>>>> >> anymore, because Mappers take an instance of an inner class.
>>>> >>
>>>> >> It's of course possible to dress up the Context in something else
>>>> >> (say, something just like an OutputCollector), and to specify that
>>>> >> Mahout Mappers should just delegate to a method that takes an
>>>> >> OutputCollector. But, this seems to not be very idiomatic.
>>>> >>
>>>> >> All this goes to say, what would be a "best practice" for testing
>>>> >> Mappers and Reducers in 0.20.0?
>>>> >>
>>>> >> Thanks,
>>>> >> David Hall
>>>> >>
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Ted Dunning, CTO
>>> DeepDyve
>>>
>>
>

Mime
View raw message