hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject Re: chained mappers & reducers
Date Wed, 20 Jan 2010 06:53:03 GMT
Can you elaborate on your case a little?
If you need sort and shuffle ( ie outputs of different reducer tasks of R1 to be aggregated
in some way ) , you have to write another map-red job. If you need to process only local reducer
data ( ie your reducer output key is same as input key ),  your job would be M1-R1-M2. Essentially
in Hadoop, you can have one sort and shuffle phase in one job.
Note that chain APIs are for jobs of the form (M+RM*).


On 1/20/10 2:29 AM, "Clements, Michael" <Michael.Clements@disney.com> wrote:

These two classes are not really symmetric as the name suggests.
ChainedMapper does what I expected: chains multiple map steps. But
ChainedReducer does not chain reducer steps. It chains map steps to
follow a reduce step. At least, that is my understanding given the API
docs & examples I've read.

Is there a way to chain multiple reducer steps? I've got a job that
needs a M-R1-R2. It currently has 2 phases: M1-R1 followed by M2-R2,
where M2 is an identity pass-through mapper. If there were a way to
chain 2 reduce steps the way ChainedMapper chains map steps, I could
make this into a one-pass job, eliminating the overhead of a second job
and all the unnecessary I/O.


Michael Clements
Solutions Architect
206 664-4374 office
360 317 5051 mobile

View raw message