hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From madhu phatak <phatak....@gmail.com>
Subject Re: Map->Reduce->Reduce
Date Thu, 03 Feb 2011 11:16:10 GMT
Reducer will get the <Key,Value> pair in sorted manner.If you can generate
key in order of required sort you can process in map reduce job

On Tue, Jan 25, 2011 at 6:21 PM, Harsh J <qwertymaniac@gmail.com> wrote:

> Vanilla Hadoop does not support this without the intermediate I/O
> cost. You can checkout the Hadoop Online Project at
> http://code.google.com/p/hop, as that does support letting a Reducer's
> output go directly to the next job's mapper (as in, a pipeline).
>
> In this topic of pipelining, also checkout what's being done in Plume
> (Based on Google's FlumeJava): http://github.com/tdunning/Plume
>
> On Tue, Jan 25, 2011 at 5:16 PM, Matthew John
> <tmatthewjohn1988@gmail.com> wrote:
> > Hi all,
> >
> >
> > I was working on a MapReduce program which does BytesWritable
> > dataprocessing. But currently I am basically running two MapReduces
> > consecutively to get the final output :
> >
> > Input  ----(MapReduce1)---> Intermediate ----(MapReduce2)---> Output
> >
> > Here I am running MapReduce2 only to sort the intermediate data on the
> basis
> > of a Key comparator logic.
> >
> > I wanted to cut short the number of MapReduces to just one. I have
> figured
> > out a logic to do the same. But the only problem is that in my  logic I
> need
> > to run a sort on the Reduce output to get the  final output. the flow
> looks
> > like this :
> >
> > Input ----(MapReduce1)----> Output (not sorted)
> >
> > I want to know if its possible to attach one more Reduce module to the
> > dataflow so that it can perform the inherent sort before the 2nd reduce
> > call. It would look like :
> >
> > Input --(Map)---> MapOutput ---(Reduce1)-->Output (not sorted)
> ---(Reduce2 -
> > for which Reduce 1 acts as a Mapper)---> Output
> >
> > Please let me know  if  there can be some means of sorting the output
> > without invoking a separate MapReduce just for the sake of sorting it .
> >
> > Thanks ,
> > Matthew
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message