hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zsongbo <zson...@gmail.com>
Subject Re: setGroupingComparatorClass() or setOutputValueGroupingComparator() does not work for Combiner
Date Mon, 11 May 2009 12:59:38 GMT
Thanks Min Zhou,
I had just have a glance of 0.20 last week. It seems a big code
reorganization.
I had read the SecondarySort class too, it seems same as 0.19.

Partitioner can only hash/partition the map output key-value to different
reducer. In my code I have a partitioner which partition the output by hash
of userid.

My question is wanting to get the same feature of separate-group in
Combiner.

On Mon, May 11, 2009 at 8:37 PM, Min Zhou <coderplay@gmail.com> wrote:

> Hey Schubert,
>
> You need at least two new classes, a Partitioner and a Comparator for
> different grouping and sorting.
> There is an example in hadoop's source code can deal with this sort of
> problems. Download the least release of hadoop(version 0.20.0)
> and check out src/examples/SecondarySort.java.
> BTW, KeyFieldBasedPartitioner and KeyFieldBasedComparator can also be
> trouble-shooters for you, however, they have somewhat bugs.
>
>
> On Mon, May 11, 2009 at 7:42 PM, zsongbo <zsongbo@gmail.com> wrote:
>
> > Thanks Jothi,
> > For example, I have a dataset with map key="city+userid+time". The output
> > of
> > mapper are sorted by this map key.
> >
> > Than, I group the reduce output according to "city+userid" by define
> > my OutputValueGroupingComparator
> > which just compare "city+userid" in the mapkey. I still want the output
> are
> > sorted by time in each group.
> >
> > It works fine.
> >
> > But to improve the performance, I want to use combiner which should also
> > group as "city+userid", but sorted by "city+userid+time".
> >
> > I do not know if this requirement is reasonable.
> >
> >
> > Schubert
> >
> > On Thu, May 7, 2009 at 7:53 PM, Jothi Padmanabhan <jothipn@yahoo-inc.com
> > >wrote:
> >
> > > OutputValueGroupingComparator is used only at the reducer. AFAIK, I do
> > not
> > > think you can have a different comparator for combiners.
> > >
> > > Jothi
> > >
> > >
> > > On 5/7/09 3:32 PM, "zsongbo" <zsongbo@gmail.com> wrote:
> > >
> > > > Hi all,
> > > > I have a application want the rules of sorting and grouping use
> > > > different Comparator.
> > > >
> > > > I had tested 0.19.1 and 0.20.0 about this function, but both do not
> > work
> > > for
> > > > Combiner.
> > > >
> > > > In 0.19.1, I use job.setOutputValueGroupingComparator(), and
> > > > in 0.20.0, I use job.setGroupingComparatorClass()
> > > >
> > > > This function is ok for reduce phase, the reduce phase can group the
> > keys
> > > by
> > > > above Comparator, and sort by default comparator of the key class.
> > > >
> > > > But I want the combiner can use a separator comparator for group,
> > > different
> > > > from sorting, is it possible?
> > > >
> > > > Schubert
> > >
> > >
> >
>
>
> Min
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message