hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kunsheng Chen <ke...@yahoo.com>
Subject Re: Anyway to sort "keys" before Reduce function in Hadoop ?
Date Wed, 17 Jun 2009 16:34:20 GMT

Thanks, Alex! It is really helpful, at least I know it is sorted in someway.

Furthermore, could I control it as 'Ascend' or 'Descend' order ? Say if my keys are Integers,
and I want them to be in Descend order, is it easy to do that ?


Thanks again,

-Kun

--- On Mon, 6/15/09, Alex Loddengaard <alex@cloudera.com> wrote:

> From: Alex Loddengaard <alex@cloudera.com>
> Subject: Re: Anyway to sort "keys" before Reduce function in Hadoop ?
> To: core-user@hadoop.apache.org
> Date: Monday, June 15, 2009, 11:53 PM
> Hey Kun,
> 
> Keys given to a given reducer instance are given in sorted
> order.  Meaning,
> for a given reducer JVM instance, the reduce function will
> be called several
> times, once for each key.  The order in which the keys
> are given to the
> reduce function are sorted.  The sorting happens in
> the shuffle phase, which
> is basically partitioning and sorting.  That said, if
> you have one reducer
> (which isn't possible in large jobs), keys will be given to
> you in sorted
> order.
> 
> You may be interested in the combiner phase, which is
> essentially a mini
> reduce that happens before data is transferred between
> mapper and reducer:
> 
> <http://wiki.apache.org/hadoop/HadoopMapReduce> (grep
> for "combine")
> 
> You may also find these videos useful:
> <http://www.cloudera.com/hadoop-training-mapreduce-hdfs>
> <http://www.cloudera.com/hadoop-training-programming-with-hadoop>
> 
> Hope this helps.  Let me know if I misunderstood your
> question.
> 
> Alex
> 
> On Mon, Jun 15, 2009 at 4:22 PM, Kunsheng Chen <keyek@yahoo.com>
> wrote:
> 
> >
> > Hi everyone,
> >
> > Is there anyway to sort the "keys" before Reduce but
> after Map ?
> >
> >
> > I also think of sorting keys myself in Reduce
> function, but it might take
> > too many memory once the number of results getting
> large.
> >
> > I am thinking of using some numeric value as "keys" in
> Reduce (which was
> > calculate by Map). If it is possible, I could output
> my results by some
> > orders easily.
> >
> >
> > Thanks in advance,
> >
> > -Kun
> >
> >
> >
> >
> 


      

Mime
View raw message