hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Nelson <dieseld...@gmail.com>
Subject Re: Simplifying MapReduce API
Date Tue, 27 Aug 2013 19:56:19 GMT
I agree with @Shahab - it's simple enough to declare both interfaces in one
class if that's what you want to do.  But given the distributed behavior of
Hadoop, it's likely that your mappers will be running on different nodes
than your reducers anyway - why ship around duplicate code?

On Tue, Aug 27, 2013 at 9:48 AM, Shahab Yunus <shahab.yunus@gmail.com>wrote:

> For starters (experts might have more complex reasons), what if your
> respective map and reduce logic becomes complex enough to demand separate
> classes? Why tie the clients to implement both by moving these in one Job
> interface. In the current design you can always implement both (map and
> reduce) interfaces if your logic is simple enough and go the other route,
> of separate classes if that is required. I think it is more flexible this
> way (you can always build up from and on top of granular design, rather
> than other way around.)
> I hope I understood your concern correctly...
> Regards,
> Shahab
> On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker <apennebaker@42six.com
> > wrote:
>> There seems to be an abundance of boilerplate patterns in MapReduce:
>> * Write a class extending Map (1), implementing Mapper (2), with a map
>> method (3)
>> * Write a class extending Reduce (4), implementing Reducer (5), with a
>> reduce method (6)
>> Could we achieve the same behavior with a single Job interface requiring
>> map() and reduce() methods?


"A child of five could understand this.  Fetch me a child of five."

View raw message