hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Usefulness of ChainMapper/ChainReducer
Date Fri, 28 Sep 2012 13:39:13 GMT


I've always had the same question before. However, Tom White put that
thought to rest:

"It’s possible to make map and reduce functions even more composable
than we have done. A mapper commonly performs input format parsing,
projection (selecting the relevant fields), and filtering (removing
records that are not of interest). In the mappers you have seen so
far, we have implemented all of these functions in a single mapper.
However, there is a case for splitting these into distinct mappers and
chaining them into a single mapper using the ChainMapper library class
that comes with Hadoop. Combined with a ChainReducer, you can run a
chain of mappers, followed by a reducer and another chain of mappers
in a single MapReduce job." - Tom White, Hadoop: Definitive Guide (2nd

Personally though, I've not really used it much. They aren't anything
more than convenience methods. Not "real" chaining at the framework

On Fri, Sep 28, 2012 at 7:02 PM, Sigurd Spieckermann
<sigurd.spieckermann@gmail.com> wrote:
> Hi guys,
> I have stumbled upon ChainMapper and ChainReducer and I am wondering why
> they exist. I imagine that everything you can implement with ChainMapper and
> ChainReducer can be implemented with just a Mapper and a Reducer containing
> all the code of the respective chain-implementations. Or am I missing
> certain aspects about why they are more than just convenience concepts?
> Thanks for clarifying this!
> Sigurd

Harsh J

View raw message