hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Lam <chuck....@gmail.com>
Subject Re: Anyway to sort "keys" before Reduce function in Hadoop ?
Date Wed, 17 Jun 2009 19:26:07 GMT
an alternative is to create a new WritableComparator and then set it
in the JobConf object with the method setOutputKeyComparatorClass().
You can use IntWritable.Comparator as a start.


On Wed, Jun 17, 2009 at 9:37 AM, tim robertson
<timrobertson100@gmail.com> wrote:
>
> I think you can do this by creating your own key type extending IntWritable
> and override the compareTo method to implement this.
> Cheers
>
> Tim
>
>
>
>
> On Wed, Jun 17, 2009 at 6:34 PM, Kunsheng Chen <keyek@yahoo.com> wrote:
>
> >
> > Thanks, Alex! It is really helpful, at least I know it is sorted in
> > someway.
> >
> > Furthermore, could I control it as 'Ascend' or 'Descend' order ? Say if my
> > keys are Integers, and I want them to be in Descend order, is it easy to do
> > that ?
> >
> >
> > Thanks again,
> >
> > -Kun
> >
> > --- On Mon, 6/15/09, Alex Loddengaard <alex@cloudera.com> wrote:
> >
> > > From: Alex Loddengaard <alex@cloudera.com>
> > > Subject: Re: Anyway to sort "keys" before Reduce function in Hadoop ?
> > > To: core-user@hadoop.apache.org
> > > Date: Monday, June 15, 2009, 11:53 PM
> > > Hey Kun,
> > >
> > > Keys given to a given reducer instance are given in sorted
> > > order.  Meaning,
> > > for a given reducer JVM instance, the reduce function will
> > > be called several
> > > times, once for each key.  The order in which the keys
> > > are given to the
> > > reduce function are sorted.  The sorting happens in
> > > the shuffle phase, which
> > > is basically partitioning and sorting.  That said, if
> > > you have one reducer
> > > (which isn't possible in large jobs), keys will be given to
> > > you in sorted
> > > order.
> > >
> > > You may be interested in the combiner phase, which is
> > > essentially a mini
> > > reduce that happens before data is transferred between
> > > mapper and reducer:
> > >
> > > <http://wiki.apache.org/hadoop/HadoopMapReduce> (grep
> > > for "combine")
> > >
> > > You may also find these videos useful:
> > > <http://www.cloudera.com/hadoop-training-mapreduce-hdfs>
> > > <http://www.cloudera.com/hadoop-training-programming-with-hadoop>
> > >
> > > Hope this helps.  Let me know if I misunderstood your
> > > question.
> > >
> > > Alex
> > >
> > > On Mon, Jun 15, 2009 at 4:22 PM, Kunsheng Chen <keyek@yahoo.com>
> > > wrote:
> > >
> > > >
> > > > Hi everyone,
> > > >
> > > > Is there anyway to sort the "keys" before Reduce but
> > > after Map ?
> > > >
> > > >
> > > > I also think of sorting keys myself in Reduce
> > > function, but it might take
> > > > too many memory once the number of results getting
> > > large.
> > > >
> > > > I am thinking of using some numeric value as "keys" in
> > > Reduce (which was
> > > > calculate by Map). If it is possible, I could output
> > > my results by some
> > > > orders easily.
> > > >
> > > >
> > > > Thanks in advance,
> > > >
> > > > -Kun
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> >

Mime
View raw message